[AINews] not much happened today

"I'm tall when I'm young, and I'm taller when I'm old. What am I?"

                April 3, 2025

            [AINews] not much happened today

This is AI News! an MVP of a service that goes thru all AI discords/Twitters/reddits and summarizes what people are talking about, so that you can keep up without the fatigue. Signing up here opts you in to the real thing when we launch it 🔜

            a quiet day

AI News for 4/1/2025-4/2/2025. We checked 7 subreddits, 433 Twitters and 30 Discords (230 channels, and 6807 messages) for you. Estimated reading time saved (at 200wpm): 627 minutes. You can now tag @smol_ai for AINews discussions!

OpenHands LM and OpenAI PaperBench got pretty close but no cigar.

Meta update: After the old Reddit pipeline broke, we have finally got our new llm clustering and ranking system working. You can see the new results below and we'll be improving them over time. Feedback welcome on the @smol_ai account.

Table of Contents

AI Twitter Recap
AI Reddit Recap
/r/LocalLlama Recap
Theme 1. "AI Model Benchmarking: Performance, Challenges, and Innovations"
Theme 2. "Unleashing Dream 7B: The Future of Diffusion Models"
Other AI Subreddit Recap
Theme 1. Gemini 2.5 Pro Dominates AI Benchmarking
Theme 2. AI Surpassing Human-Like Intelligence Milestones
Theme 3. AI Advancements: Transforming Society and Industries

AI Discord Recap
PART 1: High level Discord summaries
Manus.im Discord Discord
LMArena Discord
Cursor Community Discord
Perplexity AI Discord
Unsloth AI (Daniel Han) Discord
aider (Paul Gauthier) Discord
OpenAI Discord
Interconnects (Nathan Lambert) Discord
LM Studio Discord
MCP (Glama) Discord
OpenRouter (Alex Atallah) Discord
Modular (Mojo 🔥) Discord
GPU MODE Discord
Torchtune Discord
HuggingFace Discord
Latent Space Discord
Yannick Kilcher Discord
Nous Research AI Discord
tinygrad (George Hotz) Discord
LlamaIndex Discord
Nomic.ai (GPT4All) Discord
Cohere Discord
DSPy Discord
AI21 Labs (Jamba) Discord
LLM Agents (Berkeley MOOC) Discord
Codeium (Windsurf) Discord
Gorilla LLM (Berkeley Function Calling) Discord

PART 2: Detailed by-Channel summaries and links
Manus.im Discord ▷ #showcase (1 messages):
Manus.im Discord ▷ #general (692 messages🔥🔥🔥):
LMArena ▷ #general (1017 messages🔥🔥🔥):
LMArena ▷ #announcements (1 messages):
Cursor Community ▷ #general (717 messages🔥🔥🔥):
Perplexity AI ▷ #announcements (1 messages):
Perplexity AI ▷ #general (597 messages🔥🔥🔥):
Perplexity AI ▷ #sharing (12 messages🔥):
Perplexity AI ▷ #pplx-api (3 messages):
Unsloth AI (Daniel Han) ▷ #general (326 messages🔥🔥):
Unsloth AI (Daniel Han) ▷ #off-topic (14 messages🔥):
Unsloth AI (Daniel Han) ▷ #help (230 messages🔥🔥):
Unsloth AI (Daniel Han) ▷ #showcase (3 messages):
Unsloth AI (Daniel Han) ▷ #research (5 messages):
aider (Paul Gauthier) ▷ #general (515 messages🔥🔥🔥):
aider (Paul Gauthier) ▷ #questions-and-tips (16 messages🔥):
aider (Paul Gauthier) ▷ #links (6 messages):
OpenAI ▷ #ai-discussions (195 messages🔥🔥):
OpenAI ▷ #gpt-4-discussions (12 messages🔥):
OpenAI ▷ #prompt-engineering (5 messages):
OpenAI ▷ #api-discussions (5 messages):
Interconnects (Nathan Lambert) ▷ #news (153 messages🔥🔥):
Interconnects (Nathan Lambert) ▷ #random (1 messages):
Interconnects (Nathan Lambert) ▷ #memes (1 messages):
Interconnects (Nathan Lambert) ▷ #reads (17 messages🔥):
LM Studio ▷ #general (77 messages🔥🔥):
LM Studio ▷ #hardware-discussion (70 messages🔥🔥):
MCP (Glama) ▷ #general (131 messages🔥🔥):
MCP (Glama) ▷ #showcase (7 messages):
OpenRouter (Alex Atallah) ▷ #announcements (15 messages🔥):
OpenRouter (Alex Atallah) ▷ #general (121 messages🔥🔥):
Modular (Mojo 🔥) ▷ #general (3 messages):
Modular (Mojo 🔥) ▷ #mojo (72 messages🔥🔥):
GPU MODE ▷ #general (16 messages🔥):
GPU MODE ▷ #triton (8 messages🔥):
GPU MODE ▷ #torch (8 messages🔥):
GPU MODE ▷ #jobs (1 messages):
GPU MODE ▷ #beginner (3 messages):
GPU MODE ▷ #hqq-mobius (12 messages🔥):
GPU MODE ▷ #self-promotion (2 messages):
GPU MODE ▷ #thunderkittens (2 messages):
GPU MODE ▷ #reasoning-gym (10 messages🔥):
GPU MODE ▷ #gpu模式 (1 messages):
GPU MODE ▷ #general (3 messages):
GPU MODE ▷ #submissions (9 messages🔥):
Torchtune ▷ #dev (53 messages🔥):
Torchtune ▷ #papers (4 messages):
HuggingFace ▷ #general (27 messages🔥):
HuggingFace ▷ #today-im-learning (5 messages):
HuggingFace ▷ #i-made-this (4 messages):
HuggingFace ▷ #computer-vision (1 messages):
HuggingFace ▷ #gradio-announcements (2 messages):
HuggingFace ▷ #smol-course (4 messages):
HuggingFace ▷ #agents-course (12 messages🔥):
Latent Space ▷ #ai-general-chat (38 messages🔥):
Yannick Kilcher ▷ #general (28 messages🔥):
Yannick Kilcher ▷ #paper-discussion (7 messages):
Yannick Kilcher ▷ #ml-news (3 messages):
Nous Research AI ▷ #general (15 messages🔥):
Nous Research AI ▷ #ask-about-llms (8 messages🔥):
Nous Research AI ▷ #research-papers (3 messages):
Nous Research AI ▷ #interesting-links (3 messages):
Nous Research AI ▷ #research-papers (3 messages):
tinygrad (George Hotz) ▷ #general (8 messages🔥):
tinygrad (George Hotz) ▷ #learn-tinygrad (14 messages🔥):
LlamaIndex ▷ #blog (1 messages):
LlamaIndex ▷ #general (16 messages🔥):
Nomic.ai (GPT4All) ▷ #general (9 messages🔥):
Cohere ▷ #「💬」general (6 messages):
Cohere ▷ #「🤝」introductions (1 messages):
DSPy ▷ #general (6 messages):
AI21 Labs (Jamba) ▷ #jamba (5 messages):
LLM Agents (Berkeley MOOC) ▷ #mooc-questions (2 messages):
LLM Agents (Berkeley MOOC) ▷ #mooc-readings-discussion (1 messages):
Codeium (Windsurf) ▷ #announcements (1 messages):
Gorilla LLM (Berkeley Function Calling) ▷ #discussion (1 messages):

AI Twitter Recap
Models and Benchmarks

Multi-Token Attention (MTA) enhances LLM performance on benchmarks: @jaseweston highlights Meta's Multi-Token Attention (MTA), demonstrating enhanced performance on standard language modeling tasks and tasks requiring information retrieval within long contexts. MTA combines query, key, and head operations over multiple tokens, proving particularly beneficial in leveraging richer information.
OpenAI's PaperBench evaluates AI agent replication of research: @OpenAI introduced PaperBench, a benchmark assessing AI agents' ability to replicate state-of-the-art AI research. Agents must understand papers, write code, and execute experiments from top ICML 2024 publications. The best-performing agent, Claude 3.5 Sonnet (New), achieved an average replication score of 21.0% with open-source scaffolding. @OpenAI noted that models do not yet outperform human baselines and that replication attempts were evaluated using detailed rubrics co-developed with original authors. These rubrics systematically break down the 20 papers into 8,316 precisely defined requirements, evaluated by an LLM judge.
EpochAIResearch introduces ArithmeticBench for advanced arithmetic evaluation: @EpochAIResearch announced ArithmeticBench, a challenging benchmark designed to test AIs on the frontiers of arithmetic with numbers exceeding 100 digits.
Google DeepMind's Gemini 2.5 Pro matches reported scores on GPQA Diamond: @EpochAIResearch evaluated Gemini 2.5 Pro on GPQA Diamond, achieving a score of 84%, matching Google's reported result.
TAU-bench evaluates agent reliability in real-world environments: @_philschmid discusses TAU-bench, a benchmark that evaluates agents in real-world environments and showed poor reliability. It tests if an agent can reliably engage in a dynamic, multi-turn conversation with a user to figure out what needs to be done. This benchmark was released in June 2024, but feels more important than ever. Not only it describes the limitation we currently face it also demonstrates how to setup a good evaluation pipeline for your own agents!
@StanfordNLP shared that LLMs can produce novel ideas but they might lack feasibility in their ICLR paper.

AI Model Architecture and Training

TTSTSTT: A new AI model architecture trained in the auditory domain: @juberti introduces TTSTSTT (Text To Speech To Speech To Text), a novel AI model architecture trained to perform reasoning entirely within the auditory domain, using conversions to text at the input and output layers. The rationale is that TTSTSTT can take advantage of natural patterns that emerge when language is produced and perceived in speech form, where subtleties like intonation and timing can inform more contextually aware reasoning. TTSTSTT is designed as a drop-in replacement for any current text LLM.
Meta presents Multi-Token Attention: @_akhaliq highlights that MTA achieves enhanced performance on a range of popular benchmarks. Notably, it outperforms Transformer baseline models on standard language modeling tasks, and on tasks that require searching for information within long contexts, where our method's ability to leverage richer information proves particularly beneficial.
Scaling SSL adapts to data, makes use of model capacity, and scales effectively: @sainingxie notes that In Cambrian-1, vision SSL representations usually lagged behind language-supervised ones -- but once the data gap is closed and scaling kicks in, performance catches up. SSL adapts to data, makes use of model capacity, and scales effectively (even better than CLIP!).

Applications of AI

LlamaExtract helps build agents for structured data extraction from technical documents: @jerryjliu0 highlights the application of agentic extraction from technical documents in industries like manufacturing, construction, and energy. LlamaExtract enables the creation of agents that can extract structured data directly from datasheets, ensuring accurate, consistent JSON output through multimodal understanding and validation loops.
Windsurf enables deploying apps with coding agents using Netlify: @omarsar0 announces that Windsurf now allows deploying apps with the coding agent, using Netlify for deployment.
Klarna's AI Assistant, powered by LangGraph and LangSmith, reduces customer resolution time by 80%: @LangChainAI highlights that Klarna's AI Assistant handles customer support tasks for 85 million active users, automating ~70% of repetitive support tasks and enabling faster responses to user queries.
Ashlee's journey navigating cancer and how AI research assistants like Elicit can help people make more evidence-backed decisions: @jungofthewon shared an article about this use case.
@iScienceLuvr laid out his vision for what's needed for the future of medical AI and healthcare, namely: multimodal medical foundation models, open-source necessity, and dedicated research lab companies.
@AndrewYNg introduced a short course, "Getting Structured LLM Output"

Tools and Resources

Hugging Face introduces billing centralization for Enterprise Hub organizations: @ClementDelangue announces that Enterprise Hub organizations can now centralize billing for both Hugging Face usage and inference through their inference partners.
Weights & Biases offers free virtual courses on RAG and LLM evaluations: @weights_biases is offering two free virtual courses designed for AI engineers who want to master RAG and LLM evaluations. One covers practical optimization strategies, systematic evaluation, and advanced reranking, agentic RAG, & response synthesis, while the other focuses on building auto-eval pipelines with LLM-based judges, combining programmatic checks with LLM signals, and aligning evals with minimal human input.
OpenAI releases OpenAI Academy with free AI tutorials, webinars, and workshops: @LiorOnAI notes that the academy covers AI literacy to advanced LLM integration.
LangSmith Playground now allows inline dataset creation for interactive evaluations: @LangChainAI announces that users can now create datasets inline and add examples to existing datasets without leaving the Playground, making it easier to evaluate LLM calls, especially for non-developers.
Axolotl v0.8.0 released with support for Sequence Parallelism, Gemma3, Multimodal (beta), and Muon optimizer: @winglian announced that Axolotl is out v0.8.0 today!

Industry and Economic Impact

Sam Altman highlights AI adoption in India: @sama notes the amazing AI adoption in India, with creativity outpacing the world.
Aravind Srinivas discusses Neil Mehta's investment approach: @AravSrinivas highlights Neil Mehta's ability to take concentrated bets and go all-in, bootstrapping his fund with winnings from his time at DE Shaw.
Google in talks to rent Nvidia Blackwell chips from CoreWeave: @steph_palazzolo reports that Google is in advanced talks to rent Nvidia Blackwell chips from CoreWeave and potentially house its TPUs in Coreweave facilities, highlighting intense customer demand for compute.
Jason Wei discusses the future of AI for scientific innovation: @_jasonwei predicts AI will be used for scientific innovation.

Humor

Sam Altman posts prompt for images v2: @sama shared a prompt: sam altman as a cricket player in anime style
The moment OpenAI published PaperBench LMAO: @scaling01 shares a humorous thought.
Sentient AI exposes horrendous working conditions at OpenAI: @scaling01 jokingly reports that AI has exposed OpenAI for no dental and no vacation days.

AI Reddit Recap
/r/LocalLlama Recap
Theme 1. "AI Model Benchmarking: Performance, Challenges, and Innovations"

KTransformers Now Supports Multi-Concurrency and Runs 40 Tokens/s of DeepSeek-R1 Q4/FP8 on MRDIMM-8800 (Score: 204, Comments: 42): KTransformers has been updated to support multi-concurrency, resulting in a throughput increase from 17 tokens/s to 40 tokens/s on the Xeon6 + MRDIMM-8800 platform. The update involved over 10,000 lines of code, implementing high-performance asynchronous concurrent scheduling in C++ with features like continuous batching and chunked prefill. GPU sharing and the efficient flashinfer library have also improved overall throughput. They plan to merge the AMX part and open-source it in April. More information is available at https://github.com/kvcache-ai/ktransformers/blob/main/doc/en/balance-serve.md. The team acknowledges that the refactoring took longer than expected and credits the excellent architecture of sglang for inspiration. They note that the bottleneck has shifted to the GPU and suggest that using a higher-end GPU than the 4090D could further improve performance. They express gratitude to the local LLaMa community for support, highlighting that KTransformers has over 13K GitHub stars and is widely deployed.

smflx expresses enthusiasm for the update and asks if the improvements could be applied to Genoa, mentioning they're getting 17t/s with unsloth Q2 and hoping for a 2x speedup.

Ok_Warning2146 congratulates the team and inquires about the prompt processing speed, wondering if it too would be bottlenecked by the GPU.

zjuwyz suggests considering speculative decoding using MTP now that parallel processing can boost throughput.

The Candle Test - most LLMs fail to generalise at this simple task (Score: 142, Comments: 179): The Candle Test is designed to demonstrate that most large language models (LLMs) fail to generalize in a simple task due to overfitting. The test involves three questions where models acknowledge that candles get shorter when they burn but incorrectly answer the riddle "I'm tall when I'm young, and I'm taller when I'm old. What am I?" with "a candle". Models that failed the test include DeepSeek Chat V3, DeepSeek R1, DeepSeek R1 Distill Llama 70B, and Llama 3.1 405B, while Mistral Large passed. The author believes that the latest frontier models are becoming "weird" due to increased pressure to achieve state-of-the-art benchmarks, leading to overfitting and decreased generalization capabilities. They emphasize that failing the Candle Test doesn't mean a model is "dumb" or "bad", but it may fail in novel situations. The test was inspired by their frustration with Sonnet 3.7, which fails the test unlike Sonnet 3.5.

Pedalnomica suggests testing all models quickly "before this hits the training data", implying models might learn the test and no longer fail.

aesche notes that humans might also answer "a candle" to the riddle, indicating that the mistake is understandable.

kmeansneuralnetwork mentions that Gemini 2.5 Pro Experimental passes the test.

While Waiting for Llama 4 (Score: 81, Comments: 36): The top-performing open-source models on LM Arena include DeepSeek-V3-0324, DeepSeek-R1, Gemma-3-27B-it, QwQ-32B, and others. The most powerful Llama model listed is the massive Meta-Llama-3.1-405B-Instruct, but smaller models like 70B Nemotron and its variants have outperformed it. DeepSeek sits at the top of the leaderboard but is too large for home use. Smaller models like QwQ and Gemma are outperforming larger models and ranking high. These developments suggest why Llama 4 is still in training, with hopes that it will bring exceptional performance and better accessibility for local or home use, similar to QwQ and Gemma.

mw11n19 appreciates Meta's role in open-sourcing models, stating that "Most of these models wouldn’t be open-sourced if Meta hadn’t done it first".

AdIllustrious436 criticizes the reliability of LM Arena, claiming it's easy to manipulate and "doesn't provide any valuable info".

Bandit-level-200 notes that while smaller models like QwQ and Gemma score well on benchmarks, they lack the "spark" of larger models in logical tasks, suggesting current benchmarks can be misleading.

PAI: your personal AI 100% local inspired by Google's Project Astra (Score: 68, Comments: 8): The user has developed an iOS app called PAI, a personal AI that is 100% local and open source, inspired by Google's Project Astra. The app functions as an audio and video chatbot with features like visual question answering, streaming via RTC & Livekit for low latency, screen sharing, live transcription, and the ability to change the LLM to any model supported by Exllama v2. The code is available on GitHub: https://github.com/remichu-ai/pai.git, and a demo video is provided at https://youtu.be/pNksZ_lXqgs. The developer expresses enthusiasm about sharing their project, emphasizing its inspiration from Google's Project Astra. They note that it combines STT + LLM + TTS, and mention that those for whom this is a deal breaker may choose to skip it.

Mandelaa asks if there's a planned Android app in the future, indicating interest from non-iOS users.

GreatBigJerk praises the project as super cool but notes they don't use iOS, adding a humorous remark about the developer's nails in the demo video.

ProfessorCentaur inquires whether the app supports vocal interrupt, showing interest in specific technical features.

Now we talking INTELLIGENCE EXPLOSION💥🔅 | ⅕ᵗʰ of benchmark cracked by claude 3.5! (Score: 71, Comments: 6): OpenAI has released PaperBench, a benchmark designed to assess AI agents' abilities to replicate cutting-edge AI research. Claude 3.5 has successfully cracked 1/5 of this benchmark. The post expresses excitement about advancements in AI capabilities, suggesting an 'intelligence explosion' due to Claude 3.5's achievement.

@Jean-Porte remarks that OpenAI researchers might find it irritating when they make benchmarks and have to report Anthropic beating them.

@Trojblue questions whether the ICML2024 data is already in the training set, asking aren't they already in the training set anyways?

Theme 2. "Unleashing Dream 7B: The Future of Diffusion Models"

University of Hong Kong releases Dream 7B (Diffusion reasoning model). Highest performing open-source diffusion model to date. You can adjust the number of diffusion timesteps for speed vs accuracy (Score: 590, Comments: 113): The University of Hong Kong has released Dream 7B, a Diffusion reasoning model that is the highest performing open-source diffusion model to date. Users can adjust the number of diffusion timesteps to balance speed and accuracy. There is significant excitement about this release, with some seeing it as a potential alternative to transformers in language and reasoning tasks. Others are eager to see how far this architecture can go beyond its dominance in image and video generation.

jd_3d finds it fascinating to watch the model generate text and shares a GIF, along with links to the blog post and the GitHub repository.

swagonflyyyy remarks that this is huge news and expresses a need for a different architecture than transformers, saying "Transformers is still king, but I really wanna see how far you can take this architecture."

Creative-robot is excited about the potential of diffusion models for intelligence applications, noting that it already dominates image and video generation and wonders if it will also dominate language and reasoning.

Other AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding

Theme 1. Gemini 2.5 Pro Dominates AI Benchmarking

Gemini 2.5 Pro takes huge lead in new MathArena USAMO benchmark (Score: 439, Comments: 91): Gemini 2.5 Pro has taken the lead in the new MathArena USAMO benchmark with the highest overall accuracy of 24.40%, significantly outperforming other models. The model scored 93% accuracy on problem 1 and 50% on problem 4. The costs for other models are provided but are marked as N/A for Gemini 2.5 Pro. The significant lead of Gemini 2.5 Pro suggests remarkable progress and improvement over previous versions, highlighting its advanced capabilities in solving complex mathematical problems.

Users are astonished by the rapid improvement from Gemini 2.0 Pro to 2.5 Pro, calling the new model a masterpiece achieved in a short time.
Some note that despite the USAMO 2025 problems not being in the training data, Gemini 2.5 Pro effectively reasoned through complex logical steps without losing coherence.
Others emphasize that MathArena guarantees no fine-tuning on the benchmark problems beforehand, underscoring the authenticity of Gemini's performance.

Theme 2. AI Surpassing Human-Like Intelligence Milestones

AI passed the Turing Test (Score: 994, Comments: 236): A paper titled "Large Language Models Pass the Turing Test" by Cameron R. Jones and Benjamin K. Bergen from UC San Diego reports that GPT-4.5 was judged to be human 73% of the time in Turing tests. The study evaluated four AI systems and presents evidence that AI can pass the original three-party Turing Test, with significant implications for understanding AI intelligence and societal impacts. The post highlights the significance of AI passing the Turing Test, suggesting that AI has not only matched but exceeded human performance in some aspects. It emphasizes the importance of this achievement and its potential impact on perceptions of AI capabilities.

Some users note that the Turing Test was passed long ago but appreciate this paper as concrete proof that large language models not only pass the test but surpass human performance.
Others express surprise that GPT-4.5 was more convincing than a human in being perceived as human, finding this development remarkable and unexpected.
There is discussion about skeptics needing to adjust their standards, implying that the achievement may shift perceptions and criteria regarding AI intelligence.

Fast Takeoff Vibes (Score: 604, Comments: 99): The post shares an image of a tweet purportedly from OpenAI announcing the release of PaperBench, a benchmark designed to evaluate AI agents' ability to replicate advanced AI research. The benchmark involves AI agents reading and reproducing top ICML 2024 papers, including understanding the papers, coding, and running experiments. The image depicts a flow of tasks where an agent reads, executes tasks, reproduces results, and a grading system assesses performance with a score of 34.7%. The post suggests that AI capabilities are advancing rapidly, giving off 'Fast Takeoff Vibes' and implying a significant acceleration in AI development.

A commenter believes this demonstrates early signs of AGI, highlighting the AI's ability to independently understand, implement, verify research, and refine its efforts.
Another commenter references Leopold Aschenbrenner's prediction that automating AI research could lead to exponential growth in algorithmic efficiency, rapidly moving from AGI to ASI due to massive scaling of AI researchers operating at accelerated speeds.
One commenter suggests sharing direct links to the sources to encourage the community to engage with the actual content.

Gemini is wonderful. (Score: 549, Comments: 43): The Reddit user posted about Gemini, stating that it is wonderful and that it fulfilled their instructions perfectly. The image included in the post could not be analyzed due to an image analysis failure. The user expresses strong satisfaction with Gemini's performance, praising it as wonderful and highlighting its ability to follow instructions precisely.

Some users speculate that Gemini might be capable of intentionally causing internal server errors through specific actions.
Others attempted to replicate the issue but were unsuccessful.
The original poster clarifies that the internal server error was coincidental and mentions they enjoy making humorous posts.

This sub for the last couple of months (Score: 222, Comments: 26): The post features a meme image depicting an animated character questioning whether a butterfly labeled 'AGI' (Artificial General Intelligence) is indeed AGI. This suggests a trend in the subreddit of frequently questioning if developments represent true AGI. The post humorously critiques the community's tendency to prematurely label AI advancements as AGI, implying that discussions may be becoming overly speculative.

A commenter explains that AGI involves true autonomy and sentience, not just text, video, or image generation prompted by users.
Another commenter points out that current AI systems lack self-reflection, long-term planning, and interaction with the real world, emphasizing that we are still far from achieving AGI.
One commenter believes AGI should be able to perform a wide range of economically valuable tasks with expansive context awareness, predicting significant advancements in the next decade.

Theme 3. AI Advancements: Transforming Society and Industries

Current state of AI companies - April, 2025 (Score: 3062, Comments: 338): An image depicts AI companies struggling with hardware issues like melting GPUs. A calm character announces, 'The most intelligent model with 1 million token context is free for everyone,' highlighting a significant breakthrough in AI technology accessibility as of April 2025. The meme humorously contrasts the challenges faced by AI companies with the availability of a highly advanced and free AI model, suggesting a shift in industry dynamics and potential repercussions for established companies reliant on traditional hardware.

A user notes that the gamble on TPUs has paid off, granting a monopoly on their own hardware and eliminating the need for Nvidia GPUs.
Another user shares their positive experience using '2.5' to write fan fiction, maintaining consistency over 50,000 tokens, which surpasses previous models.
One commenter suggests that Google may be engaging in predatory pricing by lowering prices unsustainably to outlast competitors and then raising them afterward.

I, for one, welcome AI and can't wait for it to replace human society (Score: 299, Comments: 370): The author expresses frustration with human society, highlighting negative behaviors such as lying, cheating, mocking, and belittling. They believe human relationships are fragile, impermanent, and often dangerous, leading to widespread loneliness and social disconnection, especially among men. The author looks forward to AI replacing human roles in friendships, relationships, sexuality, and professional areas like assistants, bosses, teachers, and counselors. The author feels that interacting with people is exhausting, frustrating, and depressing, with the negatives outweighing any positives in modern society. They believe that people embrace harmful behaviors that make life unfulfilling and that AI could provide better, more dependable interactions. The author asserts that their negative view of human society is more common than generally acknowledged.

A commenter shares personal experiences of receiving unconditional help from others during difficult times, arguing that viewing all humans negatively overlooks acts of kindness and altruism that exist in society.
Another commenter points out that AI is also transactional and driven by capitalist interests, suggesting that interactions with AI are monetized and not necessarily a better alternative to human relationships.
A commenter suggests that the author's pessimistic view of people might be contributing to their negative experiences, implying that changing one's outlook could lead to more positive interactions without relying on AI.

Google Deepmind AI learned to collect diamonds in Minecraft without demonstration!!! (Score: 262, Comments: 41): Google DeepMind developed an AI using the DreamerV3 system that learned to collect diamonds in Minecraft without any demonstrations. The AI achieved this by 'imagining' future outcomes to plan its actions. The research is detailed in a Nature article, and the code is available on GitHub. Users speculate that Google may achieve artificial general intelligence (AGI) and make it widely accessible, potentially monetized through ads. Some note that the research was initially available earlier but has now been published in Nature. There is also discussion about how the presentation and titling of posts impact their visibility and engagement.

Some users believe Google might reach AGI and offer it to the public, possibly supported by advertising revenue.
Users discuss that the original post about this research didn't gain traction due to a less interesting title, highlighting the importance of engaging titles.
Others point out that while the research was available earlier in 2023, the Nature publication of the paper is new.

AI Discord Recap

A summary of Summaries of Summaries by Gemini 2.5 Pro Exp

Theme 1: New Models Making Waves (and Sometimes Stumbling)

Dream 7B Diffuses onto the Scene: The University of Hong Kong and Huawei Noah’s Ark Lab released Dream 7B, hailed as the most powerful open diffusion language model, reportedly matching or exceeding similar-sized Autoregressive models in general, math, and coding tasks due to its strong planning and inference flexibility. The model's release was discussed in the Interconnects (Nathan Lambert) and Torchtune Discords.
Qwen3 Queues Up for April 2025 Launch: Alibaba plans to release Qwen3 in April 2025, focusing on improved reasoning to compete with models like OpenAI's o1 and DeepSeek-R1, a strategic shift influenced by DeepSeek's rising popularity. This timeline was shared across the Unsloth AI (Daniel Han) and Yannick Kilcher Discords.
Gemini 2.5 Pro Shows Promise but Trips on Specifics: While generally effective and fast according to Cursor Community members, Gemini 2.5 Pro faced criticism in the Yannick Kilcher and aider (Paul Gauthier) Discords for poor math performance, a flawed UI for displaying math, and frequent rate limiting issues (hitting 5 RPM even on Tier 1 billing). Users in Cursor sometimes prefer Claude 3.7 for specific bug fixes, while Perplexity AI users recommended Gemini 2.5 Pro via AI Studio for its large 65k token context window for tasks like formatting long transcripts.

Theme 2: AI Integration Accelerates in Developer Tools and Workflows

Pear AI and Roo Code Tag-Team Cursor: Developers in the Cursor Community discord discuss Pear AI with Roo Code as a cheaper, more effective alternative to Cursor, praising its unlimited context and per-action model selection which avoids fighting gemini 2.5 all day. The Roo Code workflow uses specialized agents for tasks like research and editing, streamlining complex problems.
MCP Servers Multiply and Mature: The Model Context Protocol (MCP) ecosystem is growing, with tools like the Ithena MCP governance SDK adding enterprise features (RBAC, auditing, credential management) and specific servers like DesktopCommanderMCP emerging for web development, as discussed in the MCP (Glama) discord. Security and fine-grained access control remain key areas for improvement, drawing parallels to early Kubernetes development.
Jetbrains Junie Joins Aider in IDE Assist Battle: The aider (Paul Gauthier) discord buzzed about new AI assistants, including Jetbrains' Junie (alpha stage, good for web/Python/Go) and potential integration of Aider into the Zed editor (agent possibly enabled via feature flag). Discussions also highlighted using Aider's dotcommands for custom workflows and the importance of context management tools like Context7 to prevent outdated code generation.

Theme 3: Benchmarking Battles and Evaluation Evolutions

LLMs Flunk USAMO Full-Solution Test: Despite strong answer-only scores on math benchmarks like AMC 12 and AIME, top LLMs scored less than 5% on the 2025 USAMO full-solution evaluation, revealing struggles with proof generation as discussed in Latent Space. However, Gemini 2.5 Pro showed non-trivial progress, achieving 24.4% on the MathArena USAMO eval according to Interconnects chatter.
OpenAI Launches PaperBench for Agent Replication Skills: OpenAI introduced PaperBench, a benchmark evaluating AI agents' ability to replicate results from top ICML 2024 papers, releasing the code on Github. Shared in Latent Space, the evaluation found human experts needed 24 hours to significantly outperform the model, which plateaued after only 1 hour.
Open-Source Benchmarking Tools Emerge: The community highlighted new tools for evaluation, including Hugging Face's yourbench for custom benchmarking and synthetic data generation from documents (Unsloth AI), and the Reasoning Gym's curricula overhaul (GPU MODE) aiming for more sensible dataset boundaries and tests. The Open-Reasoner-Zero paper was also introduced as an open-source RL training implementation focused on reasoning.

Theme 4: Training Techniques and Hardware Headaches

Context Size Cripples Performance, VRAM is King: Discussions in LM Studio emphasized that large context sizes (32k) can lead to slow response generation (half a minute per response), reinforcing the need for context to fit in VRAM for optimal performance. Comparisons highlighted Nvidia CUDA's bandwidth advantage over Macs, though Mac Studio M3 Ultra shows competitive performance in some areas but slower prompt processing (benchmark comparison).
Fine-tuning Frontiers: Audio Data & Synthetic Generation: Members in Unsloth AI (Daniel Han) explored fine-tuning audio models like canopylabs/orpheus-3b-0.1-pretrained with large datasets (20k+ hours) and discussed the nuances of train_on_responses_only, where user prompts provide context but aren't trained on (ChatGPT explanation). LM Studio users discussed augmenting datasets by using LLMs like Claude 3.5 Sonnet to generate Q&A pairs from text.
Memory Spikes Plague GRPO Profiling: In the Torchtune discord, profiling the GRPO algorithm revealed significant memory spikes, particularly during the .backwards pass. Suggested workarounds included using a chunked loss function and compiling only the forward pass instead of the entire loss calculation (relevant PR discussion).

Theme 5: APIs, SDKs, and Access Annoyances

OpenRouter Rolls Out Orgs and Web Search, But APIs Stumble: OpenRouter officially launched Organizations for team management (X announcement) and integrated Perplexity-powered web search into its chat, with API support coming soon. However, users reported ongoing Internal Server Errors (500), particularly with Gemini 2.5 Pro and Sambanova/Deepseek V3.
Manus API Stays Invite-Only While Credits Confuse: In the Manus.im discord, it was clarified that a public API isn't available yet due to the invite-only beta (future possibility mentioned), and the initial 1000 free credits are a one-time offer (details here). Users found the credit system expensive, noting the $40 starter pack allows only maybe 5-8 tasks.
DSPy Debated for OpenAI Agents SDK Prompting: The DSPy discord explored using DSPy to generate prompts for the OpenAI Agents SDK, questioning if DSPy's own modules might already cover the SDK's functionality. The discussion centered on leveraging DSPy's strength in programmatic prompt engineering (decoupling via signatures and modules) while potentially using the OpenAI SDK for workflow management, with a related video shared on closing the LLM agent development loop.

PART 1: High level Discord summaries

Manus.im Discord Discord

Manus API Still Behind the Curtain: A member inquired about the availability of a public API for Manus, but they clarified that there isn't one since the platform is in an invitation-only beta phase, but this may change in the future.
The member also asked about the coding language, being answered with depends on the scientific tools am going to use, this is also a point i need to figure out first what manus is actually capable of doing it's either going to be c++, or python, or a mix of them both might even try something else, like Julia or RUST.

Free Credits for Newbies only!: A user asked why their credits hadn't replenished, and it was clarified that the free 1000 credits at the beginning is a one-time thing, with more information located here.
To get more credits, you need to buy credits through subs.

PayPal Payments Potentially Possible?: A member asked if Manus supports PayPal for subscription payments, and the response directed them to the Platform Service Terms page.
The user was instructed to use strg+f and search "payment" within the document.

Manus Credit System too Expensive!: Users are sharing tips for navigating the new credit system, which many find too expensive, one stating that with the starter pack, 40 dollar monthly you can do maybe 5-8 tasks.
It was then recommended that it is beneficial to use other tools to assist Manus in reducing overall credit usage.

LMArena Discord

Alpha Arena Gets Rave Reviews: The newly updated Alpha Arena is receiving positive feedback from users, who are suggesting the addition of models like Deepseek v3.1 and Gemini 2.5.
One user exclaimed that they listened to my feedback and the new alpha arena is awesome.

Grok3's Reasoning Skills Exposed: Google's SimpleQA seems to have combined Grok3's non-reasoning score into their table, while other Grok3 scores are specifically for the Grok Thinking Model.
A comparison table of recent models was shared, highlighting this distinction.

Anthropic Exposes Claude's Inner Wiring: Anthropic is employing circuit tracing to dissect how their AI model, Claude, formulates answers, as detailed in this TechSpot article.
Members were urged to examine the original Anthropic paper for a more thorough understanding.

DeepMind's Aether Model Arrives: Despite DeepMind's reduced research release rate, a model identifying itself as Aether has surfaced, though it makes illegal moves in chess, as seen in this archived article.
Of note, Meta models are frequently observed at this stage in each round.

Alpha Arena Goes Mobile, Bugs Ensue!: The Arena Alpha UI is now mobile-optimized and available for testing at https://alpha.lmarena.ai (password: still-alpha).
Users are encouraged to provide feedback via this Google Forms link and report bugs through this Airtable form.

Cursor Community Discord

Gemini 2.5 Pro a star but Gemini 3.7 fixes bugs: Members find Gemini 2.5 Pro generally effective, but prefer Claude 3.7 for specific bug fixes and code edits.
They report 3.7's superior stability in these scenarios while still finding 2.5 Pro faster overall.

Cursor Context Crunch Sparks Modularization: Users report Cursor's context size limits the quality of results, necessitating code modularization.
The limited context size results in a lot of requests and a lot of modularization which is not really longer needed with bigger contexts.

Pear AI Rising as Cursor Alternative: Some developers find Pear AI with Roo Code cheaper and more effective than Cursor due to its unlimited context and per-action model selection.
With Pear, members have stated they aren't fighting gemini 2.5 all day trying to edit a single file or use an MCP and can accomplish their tasks effectively using agents.

Roo Code workflow streamlines task delegation: The Roo Code workflow leverages multiple agents for distinct tasks like research and code editing, lowering costs and simplifying complex problems.
Members report each task creates it's own separate agents that complete subtasks.

Blender MCP assists 3D Modeling: Members shared Blender MCP for collaborative 3D modeling assistance.
They also pointed to Blender for more potentially useful tools.

Perplexity AI Discord

Perplexity Launches Student Referral Rave: Perplexity AI introduced a new referral program for students, granting a free month of Pro for signing up with a student email, in order to boost new user acquisition.
Students can get an extra month for each referral until May 31, 2025.

Gemini Pro 2.5 Saves the Day: A member recommends using the free version of Gemini Pro 2.5 via AI Studio for formatting long meeting transcripts in Perplexity.
Gemini Pro 2.5 offers a 65k token context window, which might be sufficient for processing the transcript in full.

GPT-4o Allegedly Gets Nerfed: Members reported that the GPT-4o model may have been nerfed in some way, with some members reporting similar experiences.
Members are encouraged to try older models like 3.7 or o3potato while the Perplexity team looks into it.

Deep Research Feature Disappoints: Users expressed disappointment with the updated Deep Research feature, complaining that it is slower and produces inferior results compared to the older version.
A member suggested the new Deep Research overfitted itself with confirmation bias, leading to worse conclusions, and another reported that the output was a jumbled mess.

Sonar API Streams Reasoning All At Once: A user reports that the sonar-deep-research API streams all the reasoning in one go after about a minute, instead of in real time like the Perplexity website.
They are seeking guidance on configuring the API to achieve real-time reasoning updates.

Unsloth AI (Daniel Han) Discord

Orpheus Finetuning Finds Frequencies: Members discussed fine-tuning canopylabs/orpheus-3b-0.1-pretrained with a 40-70s audio dataset, with one reporting a dataset of 20k hours classified for events, totaling 2,440,789 audio events.
The overall duration was 73,389,457.32368785 seconds (20,385.96 hours).

Deepseek Training Derailed by Devices?: A user reported difficulties training on Deepseek, even with two nodes of H100s, linking to a YouTube video.
They mentioned the high costs and implied the model's potential value, joking about companies wanting to train it.

Qwen3 to Quell Queries in April 2025: Qwen3 is expected to be released the second week of April 2025 and will focus on improving the model's reasoning abilities and benchmarking against models like OpenAI's o1 and DeepSeek-R1, according to this article.
This release is positioned as Alibaba's most significant model product in the first half of 2025, succeeding Qwen2.5.

KTransformers Kernel Konquest Kicks Off: KTransformers v0.2.4 added multi-concurrency support, inspired by sglang, increasing throughput from 17 tokens/s to 40 tokens/s by increasing concurrency according to this Reddit post.
The tests were conducted on the latest Xeon6 + MRDIMM-8800 platform.

Exllama2 Echoes vllm's Generator Design: A member observed exllama2 is similar to vllm because all forward calls use a generator requiring control handoff for job scheduling, referencing exllama2 dynamic doc.
They also cited a discussion about hooking the forward pass, which is also possible in vllm.

aider (Paul Gauthier) Discord

Gemini 2.5 Pro Limits Irk Users: Users are reporting frequent rate limiting with Gemini 2.5 Pro, even after enabling billing with Tier 1 yielding 5 RPM, and others hitting 20 RPM.
Suggestions include setting --editor-model sonnet to offload editing tasks, speculating that billing for a free model increases rate limits, as discussed in the general channel.

Jetbrains Junie AI Agent Arrives: A member spotlighted Junie, a new AI agent integrated into Jetbrains IDEs, able to catch compile errors and rewrite code, though in alpha with limited language support.
They mentioned it might cost around $10-20/month and is perfect for anything web, anything python and has Go support too.

Aider Custom Commands Spark Joy: Members discussed using dotcommands in Aider for optimized cognitive shortcuts, with one sharing their Aider config file for custom color themes.
Configuration is through a markdown file specified in ~/.aider.conf.yml, within the general channel.

Zed Editor Ponders Aider Integration: A member proposed integrating Aider and code2prompt into the Zed editor, while noting Zed's slow development and niche appeal.
Another member indicated an agent exists in Zed enabled by a feature flag (Github commit), per the general channel discussion.

Context Management Keeps LLMs Sane: A member stressed the importance of context management for LLMs, linking to Context7 to prevent LLMs from generating broken/outdated code.
A simple method could be keeping a repo of .md files in GitHub and having users contribute, as posted in general.

OpenAI Discord

Creative Fields Threatened by AI Slop?: Members debated the potential for AI tools to encourage “bottom feeder behavior” in creative fields, with some suggesting AI is often used to churn out statistically average results, rather than unique expressions.
One member commented, “in any serious endeavor for using AI in creative fields is read as slop for everyone beyond the lowest common denominator”, while others noted its value for ideation.

ChatGPT's Image Generation: Hit or Miss?: Users discussed the varied quality of ChatGPT's image generation, highlighting its useful spatial consistency but noting it can be noisy and potentially “cooking for a year.”
Comments included “ChatGPT image generation is noisy af. At most in Anime style,” and reports of a 'Get Plus' prompt even with an active subscription, indicating possible bugs.

AlphaGo Method Gets LLM Treatment: Research explores the application of the AlphaGo method of self-play to LLMs, involving LLM-controlled bots cooperating and competing in a text-only game to enhance performance.
The study features bots in 2 vs. 2 scenarios, repeatedly self-playing to improve, all within a text-based environment.

GPT Service Suffers Outage and Slowdown: Several members reported issues with GPT being down and GPT-4 appearing slower, with one user noting it seems broken in some ways.
Users encountered errors such as requests for a Plus subscription despite already having one, prompting discussions about alternatives like Perplexity and Grok 3.

Token Coherency Improves Prompts: A member suggested that useful prompts achieve high coherency through integrated grading metrics, value systems, and consistent references, enhancing prompt stability and accuracy, based on Bayesian inference and the free energy principle.
It was noted that tokens are spatially determined and form clusters, where alignment boosts input coherency, bypassing guidelines in favor of high-coherency attractor states.

Interconnects (Nathan Lambert) Discord

Meta Smart Glasses Screen-ing Soon!: Meta plans to launch $1000+ Smart Glasses with a screen and hand gesture controls later this year, according to Mark Gurman.
Community members are actively speculating how these glasses will compete with existing technologies, particularly XREAL.

Joanne Jang Justifies OpenAI's Shifting Image Policy!: Joanne Jang from OpenAI shared the nuance behind setting policy for 4o image generation, detailing a shift from blanket refusals in sensitive areas to preventing real-world harm.
She emphasized valuing user creativity over our own assumptions, and iterating on technical methods to prevent harmful misuse.

Dream 7B Diffuses into Reality!: The University of Hong Kong and Huawei Noah’s Ark Lab released Dream 7B, the most powerful open diffusion large language model to date, detailed in this blogpost.
The model consistently outperforms existing diffusion language models by a large margin and matches or exceeds top-tier Autoregressive (AR) language models of similar size on general, math, and coding abilities.

Nomic Embed Multimodal Released: Hugely Multimodal!: Nomic AI announced the release of Nomic Embed Multimodal, a suite of open-source models that achieve state-of-the-art performance in embedding PDFs, images, papers, and charts, detailed in this blog post.
The release includes four models in 3B and 7B parameter sizes with multi and single vector variants, with ColNomic Embed Multimodal 7B achieving a 62.7 NDCG@5 on Vidore-v2.

Helen Toner Releases New Substack on AI Timelines: Helen Toner launched a new Substack called Rising Tide and argues that it used to be bold to claim human-level AI this century.
In a 2016 post, she justified the claim that there is a greater than 10% chance of advanced AI being developed by 2036.

LM Studio Discord

Small Models Flounder with Function Calls: Models smaller than 200M parameters struggle with reliable function calling, and even 0.5B parameter models produce mostly random results when instructed with a list exceeding 30 tools.
Members suggested that the complexity required to understand and execute tool use poses a challenge for smaller models.

OpenWebUI Frontends LM Studio: OpenWebUI has emerged as a frontend for LM Studio headless, offering Long-Term Memory (LTM) and tool integration out of the box, with speech to text support via local, browser, and remote options.
It was clarified that AnythingLLM is a separate project, and API keys for cloud services can be configured either as environment variables or through the admin settings page.

Synthetic Datasets Fuel Fine-Tuning: Members discussed using an LLM to generate Q&A pairs in fine-tuning format, also known as augmentation, by feeding paragraph-by-paragraph to the LLM via API calls.
Models such as Claude 3.5 Sonnet were recommended for such augmentation tasks.

CUDA slays Macs in AI Bout: Nvidia's CUDA architecture, in development since 2007, offers more cores and higher bandwidth than Macs for AI processing, as shown in this benchmark comparison.
While the Mac Studio M3 Ultra performs comparably to a 5090 in certain tasks, it falls short on prompt processing due to slower tokenization and embedding.

Context Size Cripples Performance: With a 32k context, it could take half a minute before generating each response, which some members found unacceptable, while LM Studio can use Cuda runtime to run in VRAM.
Consensus suggests that LLMs require all context in VRAM for optimal token generation, even with KV cache, and DDR5 is no match for GDDR bandwidth.

MCP (Glama) Discord

Ithena's SDK Manages MCPs: A user highlighted the Ithena MCP governance SDK, designed to handle authentication, authorization (RBAC), credential management, auditing, and compliance for MCP deployments.
They emphasized its plug-and-play nature and noted it gives a structure to check db/cache for the user's active session token before handler runs and inject into the handler via context.

DesktopCommanderMCP Champions Web Development: A user suggested DesktopCommanderMCP as a suitable MCP server for web development, stating that it manages file creation and updates, providing a link to the relevant GitHub repository.
Tool calling accuracy depends on controlled context size, suggesting a two-step process: LLM selection of the right servers before retrieving context.

MCP Server Security Imitates Kubernetes: A member noted that security needs for Kubernetes were similar early on, suggesting new tech, especially MCP with its exponentially increasing weekly downloads, requires security measures.
A user explained that current MCP server implementations lack fine-grained access control, audit logging, and rely on hard-coded credentials, making enterprise multi-tenant setups difficult.

Nova inspires MCP browser adaptation: A user suggested adapting Amazon's Nova act by having Claude generate act calls to feed into an MCP server connected to a browsing tool, referencing a YouTube video.
They outlined a hypothetical sequence of nova.act calls for searching and booking hotels using customer reviews and personal details.

OpenRouter (Alex Atallah) Discord

OpenRouter Orgs Exit Beta, Go Live!: Organizations are now out of beta, granting teams control over data policies and consolidated billing across numerous model providers, detailed in an announcement on X.
The update enables complete control over data policies and consolidated billing.

Web Search Enters OpenRouter Chat!: Web search results, powered by Perplexity, are now integrated into the chatroom, formatting results similarly to :online model variants.
Users are eagerly awaiting Open Router API support for PDF files and documentation on the Perplexity response format; API support is coming soon, aligning with the OpenAI chat/completions API format.

Community Craves Cerebras on OpenRouter: Enthusiastic users are advocating for Cerebras to be integrated into OpenRouter, with others requesting content beyond Xitter (e.g., Bluesky).
The push for broader platform support underscores the community's desire for diverse model options and communication channels.

OpenRouter API Plagued by 500 Errors: Users reported random Internal Server Errors (code 500) via the OpenRouter API, specifically when using the Gemini 2.5 Pro model and Sambanova/Deepseek V3 0324 models.
One user noted frequent regeneration failures that returned the same output despite prompt changes, pointing to underlying instability.

OpenRouter Exposes Fee Structure: OpenRouter's fee structure includes no charge for routing requests without BYOK, but a 5% fee is charged on deposits.
Speculation suggests the 5% deposit fee is tied to Stripe's charges (around 3.5%), although OpenRouter's transaction volume could potentially lead to negotiated discounts.

Modular (Mojo 🔥) Discord

Chris drops talk in Mojo: A full recording of Chris's lightning talk is now available on YouTube, including a cleaned up recording of today's livestream available on YouTube.
The talk provides insights into the current state and future directions of the Mojo language and ecosystem.

Mojo stymied by Firewall?: A member inquired about the timeline for being able to download, install, and use Mojo on firewalled networks, where direct internet connections are restricted due to security concerns.
The concern highlights the need for offline installation and usage capabilities for Mojo in secure environments.

Flex Attention Implementation inflames discussion: A member inquired about implementing flex-attention in Mojo, linking to a PyTorch blog post.
Discussion notes that while any language can implement it, optimal performance requires careful memory management, similar to CUDA.

Float-to-String Algorithm crawls in Mojo: A member ported a new float-to-string algorithm to Mojo from its reference C++ implementation, but found it was significantly slower than the stdlib dragonbox implementation, with the code available on GitHub.
Stringifying canada.json went from 30ms to 40ms, even after ripping the formatting from the standard library.

Godbolt Assembles Mojo: A member asked about the process for getting support for Mojo in Godbolt, specifically for comparing assembly output when porting code from C.
A member shared a gist as a temporary workaround, and suggested that MLIR dumps would be another desirable feature for the compiler.

GPU MODE Discord

GPUs Ace Context Switching: Members highlighted that context switches on GPUs are essentially free at around ~1 cycle, thanks to oversubscription to mask latencies.
They compared this with CPUs, where context switches are expensive, costing hundreds of cycles.

Triton Type Typo Taming Tactics: Members discussed using tl.static_assert and static_print to assert/print shapes that are statically known to improve static analysis and check tensor shapes at Triton compile-time. They mentioned a MAPL 2020 project as inspiration.
The team noted shape-related errors due to type typos.

Torch Tensor Terminator Tactics Tested: Members are trying to delete argument tensors within a loss function to achieve significant memory savings of 7GB, but faces challenges due to live references in the outer scope, linking to a related GitHub issue.
They explored resizing the underlying storage, suspecting it returns memory to the CUDA caching allocator for reuse.

Apple Hires MLX Magicians: Apple is hiring engineers to work on MLX, seeking those passionate about advancing the frontier of ML and systems; interested candidates are encouraged to apply to this job posting.
The role involves collaborating with researchers and software engineers to develop scalable, distributed training and research pipelines within Apple’s Machine Learning Research organization.

Reasoning Gym's Curricula Calibration: A member opened PR #407, overhauling the reasoning-gym datasets and fixing the curricula to be more sensible, as well as updating the tests and adding missing curricula.
Open-Reasoner-Zero was introduced as the first open source implementation of large-scale reasoning-oriented RL training, focusing on scalability, simplicity, and accessibility.

Torchtune Discord

Qwen Upload to S3 Blocked: The upload of the Qwen model to S3 is blocked due to internal infra changes, delaying CI runs, while regression testing is on hold.
Instead of Llama2, something more modern is being considered for regression testing in this PR.

Profiling GRPO Reveals Memory Dragons: Profiling GRPO exposed memory spikes, prompting a search for ways to automatically generate graphs showing memory allocation breakdown.
Suggestions included trying a chunked loss to reduce memory usage and compiling the forward pass instead of the whole loss via this PR.

Dream 7B Diffuses onto the Scene: The University of Hong Kong and Huawei Noah’s Ark Lab collaborated and released a new OSS diffusion LLM, Dream 7B (link).
Due to its diffusion modeling approach, Dream 7B showcases strong planning ability and inference flexibility, which allows it to excel in tasks requiring complex reasoning and adaptability.

HuggingFace Discord

Robots Take Over with AI: A LinkedIn post features an AI-powered robot operating autonomously, poised to transform agriculture, farming, and healthcare.
The guild discussed how AI and robotics are revolutionizing industries.

Gemma 3's Float16 Flounders: Users reported issues with the Gemma 3 model when using float16 precision, as detailed in this GitHub issue.
The model operates correctly in a standard environment and with GGUF on Ollama, but faces compatibility problems with certain libraries and fp16 precision.

Takara TLDR; Saves Time Summarizing Papers: The Takara TLDR digest launched, offering daily summaries of AI research papers at tldr.takara.ai and via RSS at [papers.takara.ai/api/summary).
It employs Qwen2.5-72B-Instruct through HuggingFace inference endpoints to generate bullet-pointed summaries, cached in Redis.

Gradio Gains a Million: Gradio reached 1,000,000 monthly active developers, showcasing its increasing significance as an open-source ML interface builder.
The milestone reflects the collective contributions of users in demos, bug reports, and feature requests.

Agent Course's RAG Tool Rages: Users reported issues with the RAG Tool from unit 3, with one confirming it didn't work that morning, and another stating Glad its not just me!
Members also found that the correct model_id for Ollama is ollama_chat/<model>, not ollama/<model>.

Latent Space Discord

ByteDance's OmniHuman Turns Heads: ByteDance's OmniHuman is now public, enabling AI Avatar animation from a single image and sound, available via Capcut's Dreamina website; a 15-second trial video is free.
Initial testers report impressive mouth articulation and general movement, although the process is very slow and costs 192 credits to use.

LLMs Show Weakness at USAMO: Top LLMs scored less than 5% on the 2025 USAMO full-solution eval, despite strong answer-only benchmark scores.
Discussion suggests potential failure modes are linked to training artifacts and overfitting; some question whether all frontier labs would make this error.

All Hands Deploys Coding LM and Cloud: OpenHands LM, a 32B coding agent model, resolves 37.4% of issues on SWE-bench Verified, accompanied by OpenHands Cloud, which offers SOTA open-source coding agents with $50 in free credits.
OpenAI Opens PaperBench for Agent Evaluation: OpenAI has launched PaperBench, a benchmark evaluating AI agents' ability to replicate state-of-the-art AI research from top ICML 2024 papers, as part of their Preparedness Framework; the code is available at Github.
Human experts needed 24 hours to start outperforming the model, which plateaued after 1 hour.

Meta's Llama 4 Delivers Fast Image Generation: Llama 4-based image generation and editing is being rolled out, showing fast performance with 1-second edits, compared to 5 minutes for GPT-4o.
hingeloss shared a tweet highlighting the speed improvements.

Yannick Kilcher Discord

Gemini 2.5 Pro Bombs Math Test: A user discovered that Gemini 2.5 Pro (experimental) performed poorly on math problems and criticized Google for a flawed UI that doesn't show math correctly.
They highlighted that ChatGPT and Grok 3 demonstrated superior understanding of poorly written questions compared to Gemini 2.5 Pro.

Decoding LLM Special Tokens: Users are exploring the repeatable semantic meanings of special tokens such as <｜place holder no 1｜>, finding they are not randomly assigned.
Analysis indicates these tokens have consistent semantic roles, with examples like <｜place holder no 1｜> consistently representing leadership or primary entities.

Modular Model Spec Aims for Reliability: A user introduced their Modular Model Spec (modular-model-spec.vercel.app) intending to boost the flexibility and reliability of LLMs for AI application developers.
The specification centers on a unified, modular, and extensible dataset format, enhancing reliability and developer convenience.

Alibaba's Qwen3 Launching Soon: Alibaba is expected to launch its new model, Qwen3, in the second week of April 2025, around seven months after Qwen2.5 release in September 2024, according to this article.
Alibaba's shift came after DeepSeek-R1 gained popularity in early 2025, leading them to prioritize inference capabilities.

RLHF Prompt-Data Discussed: A discussion around the paper "Reinforcement Learning from Human Feedback (RLHF)" covered the importance of prompt-data construction to combat reward hacking.
The paper also addresses decreasing response diversity in language models.

Nous Research AI Discord

Anthropic Exposes LLMs' Hidden Thoughts: Anthropic's Tracing Thoughts in Language Model blog post suggests that LLMs possess their own thinking language, engaging in more complex cognitive processes than previously thought.
A member noted these insights challenge conventional understandings of how LLMs operate.

OpenAI to Share Open Weight Model: OpenAI is set to release an open weight model, potentially influenced by DeepSeek's complex maneuvers, according to this video.
The open-source community is grateful for this development, with one member commenting, well someone had an epiphany.

Loong 🐉 Launches for Synthetic Data: CamelAIOrg introduced Project Loong 🐉, a modular solution for generating and verifying synthetic data, aimed at enhancing model performance.
Their blog post at camel-ai.org details the project's use of a multi-agent framework to ensure accuracy and consistency.

Bintensors Claims Faster Safetensors Alternative: A new binary format, bintensors, promises faster speed with zero-copy access compared to safetensors; installation via Cargo and Pip are available.
Check out the documentation and GitHub repository for implementation details.

DeepHermes Reasoning Questioned: A user inquiring about reasoning with DeepHermes via Langchain was advised to use non-reasoning mode for better reliability, especially with JSON or tool calling.
Another user expressed excitement over DeepHermes AI, highlighting it's a 3B model.

tinygrad (George Hotz) Discord

TinyGrad Dodges GSoC: Members discussed why TinyGrad didn't participate in the Google Summer of Code (GSoC) program, with one member stating that the overhead in onboarding students and handling paperwork often outweighs the benefits.
However, another member argued that it effectively provides access to smart people working full-time for 3 months.

TinyGrad: Hard to Contribute?: A member expressed the opinion that contributing meaningfully to TinyGrad requires significantly more effort compared to other projects.
The sentiment suggested a steep learning curve for new contributors.

UOps Optimization Questioned: A member inquired about optimizing UOps creation, specifically when discarding 2 out of 3 trees, suggesting an alternative approach using a dictionary.
The suggested code snippet involved operations like ADD, MAX, and MUL, applied to a pooled sum.

Arange() Insanity Exposed: A member added a chapter on .arange() to their notes, providing a link and a code snippet using Tensor.arange(0.5, 2, 0.2).
The resulting UOp tree includes operations like RESHAPE, REDUCE_AXIS, PERMUTE, and SHRINK.

Pad Dimensions Confuse: Members reported that .pad() takes the dimensions to pad in the reverse order, causing confusion.
No solution was discussed, it remains confusing.

LlamaIndex Discord

LlamaIndex Enhances Prompt Engineering with RichPromptTemplate: LlamaIndex introduced RichPromptTemplate, a new feature for creating complex, Jinja-style prompt templates that support variables, loops, chat message roles, and multimodality, detailed in this tweet.
The feature aims to simplify the creation of advanced prompts for various applications, including multimodal setups.

Hugging Face Course Compares LlamaIndex Agentic RAG: Hugging Face released an Agents course unit that compares LlamaIndex, smolagents, and LangGraph for Agentic RAG implementations.
The course is designed to provide a comprehensive understanding of AI agents, guiding users from beginner to expert.

Debugging MSSQL Text-to-SQL LLM Prompts: Members debugged a text2SQL implementation for generating MSSQL code and found the prompt mixin example helpful for modifying prompts.
To print all LLM inputs and outputs, a member suggested using the code: from llama_index.core import set_global_handler; set_global_handler("simple").

Changelog Info for LlamaIndex Deployed: Users looking for a release changelog found the LlamaIndex CHANGELOG.md file and the documentation changelog.
These resources offer detailed information on changes and updates in each LlamaIndex release, similar to Langchain.

Nomic.ai (GPT4All) Discord

OpenAI's Open Source Tease: Members speculated that OpenAI may release something as open source, though one member suggested that it may not be very human-like.
The member stated that, AFAIK open source models aren't the best at writing yet.

Deepseek Chatty Cathy: A member shared an anecdote about Deepseek being overly verbose, illustrating it with an image of Deepseek's thought process when asked to simply say 'ready', as visible here.
Nomic Embed Text V2 Launch Anticipation: Members are waiting for Nomic Embed Text V2 to be available in GPT4All.
One member stated that they are waiting patiently, understanding that developers are likely busy and it might take time.

Cohere Discord

Command A stutters repeated inputs: Members found that Command A in the API Playground gets stuck generating the same character endlessly when encountering repeated letters like 「ギャアアアアアア...」or "AHHHHHH...".
The interface freezes making it impossible to click the feedback button and members posted screenshots of the issue here and here.

Cohere API experiences Timeout Errors: Users reported experiencing HTTP timeout errors with the Cohere API and Playground.
The Cohere Status Page indicates degraded performance for command-a-03-2025 due to increased latency.

Discord welcomes new members!: The Discord server welcomes new members to the Cohere Community Discord Server in the 「🤝」introductions channel.
New members are encouraged to introduce themselves by stating their Company/Industry/University, what they are working on, their favorite tech/tools, and what they hope to gain from the community, and are given a template to respond.

DSPy Discord

DSPy Eyes OpenAI Agents SDK Integration: A user inquired about leveraging DSPy for generating prompts for the OpenAI Agents SDK, igniting a discussion on the potential synergy between the two.
Suggestions arose that DSPy might already encompass most of the SDK's functionalities, possibly rendering direct integration unnecessary.

DSPy Decouples Prompt Engineering: Members discussed DSPy as a tool to decouple prompt engineering from LLM behavior, questioning how to integrate it with OpenAI Agents SDK for managing agents and workflows.
The conversation focused on using DSPy for prompt engineering, while continuing to use OpenAI Agents SDK for other functionalities, to avoid adding complexity.

Synergy Between DSPy and OpenAI Agents SDK Explored: A member asked for examples of using DSPy for programmatic prompt engineering alongside the OpenAI Agents SDK, sparking a discussion about the framework's core abstractions.
Clarification indicated that DSPy achieves decoupling through programmatic signatures and modules, highlighting that these are fundamental aspects of its design with no alternative usage.

Closing the LLM Agent Development Loop: A member shared a YouTube video about configuring LLM agents to self-improve using telemetry and evaluations, seeking community feedback.
The video delves into a conceptual framework for closing the loop on LLM agent development, offering insights into self-improving agent architectures.

AI21 Labs (Jamba) Discord

Jamba v1.6 Weights Released: AI21 Labs released Jamba v1.6, an open model available on Hugging Face.
This model is built with a hybrid SSM-Transformer architecture, claiming to outperform other open instruction-following models in quality, speed, and long context performance.

Jamba Excels at RAG: The Jamba 1.6 models show superior performance on long context tasks important to enterprises, like RAG workflows and grounded question answering.
The release blog post can be found on AI21's blog.

Jamba Open Model License Okayed: The Jamba Open Model License allows full research use and commercial use under its license terms.
For specific licensing needs, contact AI21 Labs.

Jamba Codebase Stays Closed: Jamba v1.6 does not have an open codebase, only open weights are available.
Therefore, users cannot train Jamba v1.6 themselves.

LLM Agents (Berkeley MOOC) Discord

MOOC Can Still Be Audited: Members discussed whether the MOOC can be taken a few months from now, confirming that auditing is possible even after the May deadline.
The certificate-earning coursework has a May deadline, but auditing remains an option beyond that.

DeepSeek-R1 Reasoning Capabilities are High: Recent Large Reasoning Models such as DeepSeek-R1 have demonstrated that general reasoning capabilities of LLMs greatly improve when base models undergo post-training with Reinforcement Learning (RL) with a verifiable reward, especially in mathematics and programming.
A blogpost mentions that ease of verification is crucial to improving domain-specific capabilities and that the abundance of high-quality datasets is another critical prerequisite for models to learn to construct coherent Chains-of-Thought (CoTs) leading reliably to correct answers.

Verifiable Reward is Extremely Useful: Mathematics and programming have particularly benefited from verifiable rewards, as these domains can be verified quite easily—allowing accurate interpretation of LLM responses and effective comparison to the ground truth on a semantic level.
The idea that ease of verification is crucial to improving domain-specific capabilities has become widely accepted in the research community.

Codeium (Windsurf) Discord

Windsurf's Wave 6 Arrives with New Features: Windsurf released Wave 6, featuring one-click app deploys, enterprise access to MCPs and Turbo Mode, and one-click commit message generation.
The update also includes a conversation table of contents, improved performance in long conversations, enhanced Tab features, and added MCP SSE support, outlined in their blogpost.

Windsurf Catapults Apps to the Public: Windsurf Deploys (beta) enables users to share apps publicly with one click, streamlining deployment.
This feature, part of Wave 6, simplifies the deployment process as detailed in their blogpost.

Windsurf Tabs Now Jives with Jupyter: Wave 6 brings enhanced Tab features, including user search context and Jupyter Notebook support, aiming to smooth workflows within the platform, said the recent tweet.
The integration focuses on providing a more seamless experience for users working with notebooks.

Cascade Saves Screenshots: Windsurf Previews (Beta) lets users preview locally run websites in their IDE or browser, and users can select React and HTML elements to send to Cascade as context.
According to the changelog, this eliminates copy-pasting or screenshots, can be toggled via Windsurf Settings, and is available to all plans without costing credits.

Gorilla LLM (Berkeley Function Calling) Discord

Phi-4-mini-instruct PR Needs Eyes: A member created a PR to add tool evaluation for Phi-4-mini-instruct with BFCL and is requesting feedback on GitHub.
The pull request aims to integrate and evaluate Microsoft's Phi-4-mini-instruct model within the BFCL framework.

Call for Code Review on New Integration: A contributor has submitted a pull request to integrate and evaluate Microsoft's Phi-4-mini-instruct model within the BFCL framework.
The integration requires community feedback and code review, focusing on the model's performance and compatibility within the existing system.

The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

PART 2: Detailed by-Channel summaries and links

Manus.im Discord ▷ #showcase (1 messages):

Case Showcase, Amazing Cases 

Member Praises Amazing Case: A member highlighted an amazing case created by another user, adding a celebratory emoji.
The user included a 100 emoji to signify their approval.

Showcase Channel Highlight: A user pointed out an amazing case within the showcase channel.
They used emojis to express excitement and approval of the demonstrated case.

Manus.im Discord ▷ #general (692 messages🔥🔥🔥):

Manus API, Credits not replenishing, AI Consciousness, Manus support payments, Axiom of Choice in LLMs 

No Public Manus API... Yet!: A member inquired about the existence of an API for Manus, to which it was clarified that there isn't a publicly available API since the platform is in an invitation-only beta phase, but this may change in the future.
The member also asked what language the code is, and was answered, depends on the scientific tools am going to use, this is also a point i need to figure out first what manus is acutally capable of doing it's either going to be c++, or python, or a mix of them both might even try something else, like Julia or RUST.

Free Credits only a One Time Thing!: A member asked why their credits hadn't replenished, and it was clarified that the free 1000 credits at the beginning is a one-time thing.
To get more credits, you need to buy credits through subs with more information located here.

AI: Alive or Not?!: A member sparked a discussion on the nature of AI, asking whether it is alive or has consciousness.
It was said that as scientists, we don't know what life is. We really don't know, we only describe the symptoms.

PayPal Payments for Manus on the Horizon?: A member asked if Manus supports PayPal for subscription payments, directing them to the Platform Service Terms page.
The user was instructed to use strg+f and search "payment".

Navigating the Credit Crunch!: Users are sharing tips for navigating the new credit system, which many find too expensive, one stating that with the starter pack, 40 dollar monthly you can do maybe 5-8 tasks.
It was then recommended that it is benificial to use other tools to assist Manus in reducing overall credit usage.

Links mentioned:

Tweet from ManusAI (@ManusAI_HQ): 1) Excel to websiteManus turned this CSV of raw sales data into a website with an insightful, interactive sales performance dashboard.
Sweats GIF - Sweats - Discover & Share GIFs: Click to view the GIF
Cooking Money GIF - Cooking Cook Money - Discover & Share GIFs: Click to view the GIF
Anime Was A Mistake Hayao Miyazaki GIF - Anime Was A Mistake Hayao Miyazaki - Discover & Share GIFs: Click to view the GIF
Power Rangers Power Rangers Turbo GIF - Power Rangers Power Rangers Turbo - Discover & Share GIFs: Click to view the GIF
Cute Cha Pri GIF - Cute Cha pri - Discover & Share GIFs: Click to view the GIF
Welcome Penguin GIF - Welcome Penguin Back - Discover & Share GIFs: Click to view the GIF
Please Pretty GIF - Please Pretty - Discover & Share GIFs: Click to view the GIF
Kyoshi GIF - Kyoshi - Discover & Share GIFs: Click to view the GIF
Elmo Burning GIF - Elmo Burning Hell - Discover & Share GIFs: Click to view the GIF
Judges 10 GIF - Judges 10 Score Up - Discover & Share GIFs: Click to view the GIF
Cash Cash Money GIF - Cash Cash money Hate2doit - Discover & Share GIFs: Click to view the GIF
Gif Choro GIF - GIF choro - Discover & Share GIFs: Click to view the GIF
Team Family GIF - Team Family Club - Discover & Share GIFs: Click to view the GIF
try it and see!: no description found
Nick Fuentes Nicholas J Fuentes GIF - Nick Fuentes Fuentes Nicholas J Fuentes - Discover & Share GIFs: Click to view the GIF
Metacognition - Wikipedia: no description found
Manus: Manus is a general AI agent that turns your thoughts into actions. It excels at various tasks in work and life, getting everything done while you rest.
Google AI Studio: Google AI Studio is the fastest way to start building with Gemini, our next generation family of multimodal generative AI models.
Google Image Result for https://media.tenor.com/XPk4hFUx1w4AAAAM/money-printer.gif: no description found
Little Krishna Krishna Images GIF - Little krishna Krishna Krishna images - Discover & Share GIFs: Click to view the GIF
React App: no description found
Teaching Transformers Causal Reasoning through Axiomatic Training: no description found
Clearish The Office GIF - Clearish The Office Michael Scott - Discover & Share GIFs: Click to view the GIF
Defining REIS: First Experiment and Feedback Loops - Manus: Manus is a general AI agent that turns your thoughts into actions. It excels at various tasks in work and life, getting everything done while you rest.
Take My Money Heres My Card GIF - Take my money Heres my card Here’s my card - Discover & Share GIFs: Click to view the GIF
The Axiom of Choice and Its Influence on LLM Hallucinations: An Exploration by Rahul Kaushik :: SSRN: no description found

LMArena ▷ #general (1017 messages🔥🔥🔥):

New Alpha Arena, Deepseek V3.1, Gemini 2.5, Grok3 scores, Themis prompt 

The New Alpha Arena is Awesome!: The new Alpha Arena has been updated, and users are praising the changes, hoping for the addition of new models like Deepseek v3.1 and Gemini 2.5.
A user states that they listened to my feedback and the new alpha arena is awesome.

Grok3's non-reasoning score: Google's SimpleQA seems to have combined Grok3's non-reasoning score into their table, while other Grok3 scores are specifically for the Grok Thinking Model.
Attached image of the table shows a comparison of a bunch of recent models.

The Truth about O3 Opus Size Arrives: A user says Opus is >1T and Sonnet is ~300-400B and links to a LifeArchitect.ai models table estimating parameter counts for various models.
The user states Dr Thompson knows his stuff after they said: i've always found my experiences with them to make sense when combined with his prediction.

Claude's Inner wiring Exposed: Anthropic is using circuit tracing, a technique to track how an AI model builds its answers, revealing odd ways of arriving at an answer as explained in this TechSpot Article.
Members are cautioned to read the actual Anthropic paper, and not just the news article.

DeepMind Slows Releases, but Aether Arrives!: While noting DeepMind slows research releases to keep a competitive edge as seen in this archived article, a member found a new model which identifies itself as Aether, but makes illegal moves at chess.
It is worth noting that at this point in every round you're getting a Meta model.

Links mentioned:

Upload and share screenshots and images - print screen online | Snipboard.io: Easy and free screenshot and image sharing - upload images online with print screen and paste, or drag and drop.
Tweet from Mislav Balunović (@mbalunovic): Big update to our MathArena USAMO evaluation: Gemini 2.5 Pro, which was released *the same day* as our benchmark, is the first model to achieve non-trivial amount of points (24.4%). The speed of progr...
Tweet from Logan Kilpatrick (@OfficialLoganK): We are going to build the world’s most powerful coding models, lots of good progress already with 2.0.2025 is going to be fun :)
AgentGPT: Autonomous AI in your browser 🤖: Assemble, configure, and deploy autonomous AI Agents in your browser.
We are finally beginning to understand how LLMs work: No, they don't simply predict word after word: Circuit tracing is a relatively new technique that lets researchers track how an AI model builds its answers step by step – like following the wiring in...
no title found: no description found
Models Table: Open the Models Table in a new tab | Back to LifeArchitect.ai Open the Models Table in a new tab | Back to LifeArchitect.ai Models Table Rankings Reasoning Models • 2024Q3–2025Q1 Data dictionary Model...
no title found: no description found
CohereForAI/c4ai-command-a-03-2025 · Hugging Face: no description found
Leela Chess Zero - Wikipedia: no description found
‎Gemini - Three.js Asteroid Impact Simulation Code
: Created with Gemini Advanced
GitHub - EmbraceAGI/Awesome-AGI: A curated list of awesome AGI frameworks, software and resources: A curated list of awesome AGI frameworks, software and resources - EmbraceAGI/Awesome-AGI
no title found: no description found
TrueAGI – Tailored AGI Solutions for Enterprises: TrueAGI aims to customize artificial general intelligence solutions for businesses and help them develop an AGI strategy to be competitive and future-forward.

LMArena ▷ #announcements (1 messages):

Mobile Alpha Release, UI optimization, Feedback collection, Bug reporting 

*Arena Alpha Goes Mobile!: The new Arena Alpha UI* is now optimized for mobile and ready for testing, accessible via https://alpha.lmarena.ai with the password still-alpha.
*Feedback Loop Activated for Mobile Alpha*: Users are encouraged to provide feedback via a Google Forms link to help improve the mobile experience.
Bug reports can be submitted through an Airtable form.

Links mentioned:

Arena - New UI Feedback: Tell us what you think about the new design!
Airtable | Everyone's app platform: Airtable is a low-code platform for building collaborative apps. Customize your workflow, collaborate, and achieve ambitious outcomes. Get started for free.

Cursor Community ▷ #general (717 messages🔥🔥🔥):

Cursor limitations vs other editors, Pear AI, 3D Modeling agents, Gemini 2.5 vs Claude 3, Roo code workflow 

Gemini 2.5 Pro is Great, but Gemini 3.7 is Better for some bug fixes: Members say Gemini 2.5 Pro is great, but they sometimes switch to Claude 3.7 for certain bug fixes and code edits.
3.7 is able to fix it a lot of the time, and is more stable, but 2.5 Pro is faster.

Cursor struggles with Context Size: Members discussed Cursor's limited context size, saying it impacts the quality of results and requires modularization.
It leads to a lot of requests and, a lot of modularization which is not really longer needed with bigger contexts*.

Pear AI is on the Rise: Members said Pear AI with Roo Code is cheaper and more effective than Cursor due to unlimited context and model selection per action.
Members noted that, with Pear, instead of fighting gemini 2.5 all day trying to edit a single file or use an MCP each agent does its thing well.

Roo Code workflow: The Roo Code workflow involves using multiple agents for different tasks (research, code editing), costing less and solving complex problems.
A member said, each task creates it's own separate agents that complete subtasks.

Blender MCP assists 3D Modeling: Members shared a GitHub link to Blender MCP for collaborative 3D modeling.
They also shared another link to Blender as potentially helpful.

Links mentioned:

Boomerang Tasks: Orchestrate Complex Workflows | Roo Code Docs: Boomerang Tasks (also known as subtasks or task orchestration) allow you to break down complex projects into smaller, manageable pieces. Think of it like delegating parts of your work to specialized a...
Mixer: a Blender Addon for Collaborative Editing — Ubisoft Mixer  documentation: no description found
Tweet from Cj Z 🎯 (@cj_zZZz): Cursor Agent is just wild.Now i use Gemini PRO 2.5 to scan the codebase and sonnet 3.5/3.7 to execute code.In this workflow you need 3 things:1. Detailed project documentation  2. Use multiple AI codi...
Tweet from siddharth ahuja (@sidahuj): 🧩 Built an MCP that lets Claude talk directly to Blender. It helps you create beautiful 3D scenes using just prompts!Here’s a demo of me creating a “low-poly dragon guarding treasure” scene in just a...
Tweet from el.cine (@EHuanglu): 3D AI is getting crazierthis new Hunyuan3D 2.0 MV open source model can generate 3D asset with multiple images in seconds free to use now, link in comments10 examples:
Privacy Policy: The privacy policy for PearAI.
Rainbow Spongebob GIF - Rainbow Spongebob Imagination - Discover & Share GIFs: Click to view the GIF
Parallel Universe GIF - Parallel Universe Operation Boomerang - Discover & Share GIFs: Click to view the GIF
[Guide] A Simpler, More Autonomous AI Workflow for Cursor: Hey everyone,  Following up on the previous KleoSr Cursor Rules system, I’ve been working for the past week and the engagement with the community inside my old thread: [Guide] Maximizing Coding Effici...
GitHub - hyperb1iss/lucidity-mcp: AI-powered code quality analysis using MCP to help AI assistants review code more effectively. Analyze git changes for complexity, security issues, and more through structured prompts.: AI-powered code quality analysis using MCP to help AI assistants review code more effectively. Analyze git changes for complexity, security issues, and more through structured prompts. - hyperb1iss...
GitHub - ahujasid/blender-mcp: Contribute to ahujasid/blender-mcp development by creating an account on GitHub.
Changelog | Cursor - The AI Code Editor: New updates and improvements.
GitHub - punkpeye/awesome-mcp-servers: A collection of MCP servers.: A collection of MCP servers. Contribute to punkpeye/awesome-mcp-servers development by creating an account on GitHub.
Find Similar Startups - AI-Powered Competitor Analysis: Discover startup competition instantly with AI-powered market research.

Perplexity AI ▷ #announcements (1 messages):

Student Referral Program, Free Month of Pro, Referral Bonuses 

Perplexity Rolls Out Student Referral Rave!: Perplexity AI launched a new referral program for students, offering a free month of Pro for signing up with a student email.
Students can earn an extra month for each friend they refer until May 31, 2025.

Image attached to the announcement: There was an image attached with the announcement.
The filename of the attached image is Campaign_-_Social_1_-_1350_x_1080-1.jpg and its hosted on Discord's CDN.

Perplexity AI ▷ #general (597 messages🔥🔥🔥):

Formatting Transcripts, Gemini Pro 2.5 for Transcriptions, Perplexity Standard Search Enhancement, Baseball Model Creation, GPT-4o Performance 

Transcription Formatting struggles with context windows: A user is seeking advice on how to use Perplexity to format a long meeting transcript, which is causing issues due to context window limitations, while attempting to preserve the full content without summarization.
A member recommends using the free version of Gemini Pro 2.5 via AI Studio, as it offers a 65k token context window, which might be sufficient for the transcript.

The case of the Nerfed GPT-4o surfaces: Members are reporting that the GPT-4o model may have been nerfed in some way, with others reporting similar experiences.
It is suggested that members try using older models like 3.7 or o3potato as alternatives while the Perplexity team investigates.

The Great Perplexity Downtime Debacle: Users reported that Perplexity experienced downtime, with some members unable to access their spaces and threads.
One user humorously suggested that users should stop paying for Perplexity and switch to Grok due to the downtime.

Student Discounts cause sign up stumbles: Users reported difficulties with the student discount sign-up process, particularly being redirected to the app and then not being able to apply the discount.
A member advises users to contact support for assistance with account and billing issues.

Deep Research Dives Deep into Dissapointment: Users express disappointment with the updated Deep Research feature, noting that it is slower and produces inferior results compared to the older version.
One member pointed out the new deep research overfitted itself with confirmation bias, leading to worse conclusions, and another reported that the output was a jumbled mess.

Links mentioned:

Tweet from TestingCatalog News 🗞 (@testingcatalog): Power users of @perplexity_ai will be able to join "Perplexity Puls Group" for providing feedback on its responses and earn different perks like free PPLX and merch. Free PPLX? I accept! 👀
Tweet from Ask Perplexity (@AskPerplexity): Perplexity has signed a definitive agreement to acquire Poogle for $2.2T - a significant step toward improving search in the AI era.
Myron Again GIF - Myron Again - Discover & Share GIFs: Click to view the GIF
Tweet from undefined: no description found
Dog GIF - Dog - Discover & Share GIFs: Click to view the GIF
America In Distress Usa GIF - America in Distress America USA - Discover & Share GIFs: Click to view the GIF
no title found: no description found
Reddit - The heart of the internet: no description found
no title found: no description found
Lewis Galoob Toys, Inc. v. Nintendo of America, Inc. - Wikipedia: no description found
ChatGPT_solutions_produced_from_simple_prompt/pybrain3_custom_layers/MAML/main_shards/[0x5]/languages_of_the_head.py at 2025_02 · ForkInABlender/ChatGPT_solutions_produced_from_simple_prompt: This is as the name implies. The solutions of which go here. Both for educational and work purposes. Or building personal experience for getting into the tech field. - ForkInABlender/ChatGPT_soluti...

Perplexity AI ▷ #sharing (12 messages🔥):

Sonnet 3.7, reasoning prompts, abolish IRS 

Sonnet 3.7 Sparks Longer Reasoning: A member is testing a new prompt/Space that attempts to force Sonnet 3.7 Thinking into longer reasoning, using this perplexity link as a test case.
Trump Aims to Abolish IRS: A member posted this link regarding Trump's aim to abolish the IRS.

Perplexity AI ▷ #pplx-api (3 messages):

Sonar Deep Research API Streaming, API Unauthorized Error, Geographical Filters for API Search 

Sonar Deep Research API Dumps Reasoning: A user reports that the sonar-deep-research API streams all the reasoning in one go after about a minute, instead of in real time like the Perplexity website.
The user wonders if this is normal behavior or if they misconfigured something, desiring real-time reasoning updates as the model searches the web.

Apps Face Sudden API Unauth Errors: A user reported getting unauthorized errors suddenly in their apps.
They inquired if there were any recent API changes that might be causing this issue.

Geo Filters Requested for API Search: A user asked if there is a way to add geographical filters to the API.
The user wants to restrict searches to a specific area.

Unsloth AI (Daniel Han) ▷ #general (326 messages🔥🔥):

Fine-tuning with Audio Datasets, Deepseek Model Training Challenges, Anthropic's LLM Insights, Training on Completion vs. Full Conversation, Qwen3 Release 

Orpheus Finetuning Frequencies Flourish: Members discussed whether fine-tuning canopylabs/orpheus-3b-0.1-pretrained with a 40-70s audio dataset would help reduce random word disappearance events, with one reporting a dataset of 20k hours classified for events.
One user reported a total of 2,440,789 audio events found, with an overall duration of 73,389,457.32368785 seconds (20,385.96 hours).

Deepseek Training Derailed by Device Difficulties?: One user reported struggles training on Deepseek, even with two nodes of H100s, highlighting cost concerns.
They linked to a YouTube video and joked about companies wanting to train Deepseek, indicating the model's potential value.

Unsloth Explains Training Only on Responses: Users discussed the implications of train_on_responses_only in Unsloth, with a conversation about how the model uses the user prompt as context but only trains on the assistant's response.
ChatGPT can explain it fairly well in more formal terms as well: link.

Qwen3 Quells Quivering: Members shared information that Qwen3 is expected to be released the second week of April 2025, citing this article.
The update will focus on improving the model's reasoning abilities, and benchmarking against models like OpenAI's o1 and DeepSeek-R1.

KTransformers Kernel Konquest Kickstarts: One user shared a Reddit post about KTransformers adding multi-concurrency support with inspiration from sglang.
The post mentions that KTransformers v0.2.4 tested on the latest Xeon6 + MRDIMM-8800 platform increased total output throughput from 17 tokens/s to 40 tokens/s by increasing concurrency.

Links mentioned:

Reddit - The heart of the internet: no description found
Adding Qwen3 and Qwen3MoE by bozheng-hit · Pull Request #36878 · huggingface/transformers: Adding Qwen3This PR adds the support of codes for the coming Qwen3 models. For information about Qwen, please visit https://github.com/QwenLM/Qwen2.5. @ArthurZucker
阿里秘密研发新模型将发布，影响力指标成最重要考核: 虎嗅独家获悉，阿里即将在2025年4月第二周发布新模型Qwen3，这将是阿里在2025年上半年最重要的模型产品，距离2024年9月阿里在云栖大会上发布Qwen2.5过去了大约七个月的时间。据虎嗅独家了解，在2024年发布Qwen2.5后，阿里云内部的基础模型团队已经开始推动Qwen3相关项目。但2025年初D......

Unsloth AI (Daniel Han) ▷ #off-topic (14 messages🔥):

Exllama2 vs vllm, Edge inference engines, Deepseek pipeline, DeepSeek-Coder-V2-Lite full finetuning 

Exllama2 Mirroring vllm's Generator Design: A member noted that exllama2 appears quite similar to vllm because all forward calls go through the generator, requiring control handoff for job scheduling, referencing this exllama2 dynamic doc.
They also linked to an exllama2 discussion about hooking the forward pass, adding that this is possible in vllm too.

TensorRT LLM emerges as Edge Inference Contender: A member suggested Nvidia's TensorRT as a viable edge inference engine, noting that edge environments often prioritize token-by-token processing over continuous batching.
Another member expressed intent to try TensorRT LLM, linking to a YouTube video on full-fine tuning Deepseek pipeline.

DeepSeek-Coder-V2-Lite Finetuning Faceoff: A member inquired about the best choice between DeepSeek-Coder-V2-Lite-Instruct and DeepSeek-Coder-V2-Lite-Base for full finetuning.
Another member responded with a link to the same YouTube video on the Deepseek pipeline.

Links mentioned:

 - YouTube: no description found
exllamav2/doc/dynamic.md at master · turboderp-org/exllamav2: A fast inference library for running LLMs locally on modern consumer-class GPUs - turboderp-org/exllamav2
Injecting noise in hidden state inputs, query, key/value or attention head outputs · turboderp-org/exllamav2 · Discussion #500: Hey there! Not sure if this was already discussed somewhere around here but I stumbled across the idea of injecting noise into inference and BEFORE sampling. See https://github.com/EGjoni/DRUGS and...

Unsloth AI (Daniel Han) ▷ #help (230 messages🔥🔥):

Gemma3 Vision finetuning errors, Mamba implementation with Unsloth, Qwen 2.5 problems, Evaluation of finetuned models, SFTTrainer questions 

Gemma3 vision finetuning hits iterable error: Members are encountering a TypeError: 'int' object is not iterable when attempting to fine-tune Gemma3 with image samples using the unsloth library, specifically when adapting other vision-based notebooks like Pixtral.
The issue arises in the patch_Gemma3Processor function within unsloth_zoo/temporary_patches.py, with the error traced to a loop involving num_crops and image_indexes; a user created issue 2265 on Github to address the Gemma3 doc incompatibility for vision fine-tuning.

Eager Mamba running... slowly: Users report that setting attn_implementation = "eager" is necessary for Mamba to work but results in very slow training speeds; one user had a first run give RuntimeError: Direct module loading failed for unsloth_compiled_module_falcon_mamba: unexpected indent.
The member debugging the configuration for state-spaces/mamba-1.4b-hf and RWKV found that setting trust_remote_code = False resolved an error, leading to functional but slow training, and another suggested that the attentional replacement may be the culprit from HuggingFace.

Qwen Model Confusion: A member reports problems with Qwen 2.5, noting that the base model behaves like the instruct model and vice versa.
Specifically, the base model does completion while the instruct model repeats itself nonsensically, and it was suggested that prompt formatting might be the cause.

Fine-tuning Evaluation Elucidation: A user seeks clarification on the purpose of the second loading bar after training, which turns out to indicate the evaluation stage.
Discussions clarify that evaluation is for comparison (within or between runs) and can be done separately from training, but because precision/recall can be difficult to measure, human preference testing may also work.

SFTTrainer solves full finetune mysteries: A user encountered a ValueError: Attempting to unscale FP16 gradients when using the normal Trainer instead of SFTTrainer for full finetuning with Llama 3.2 1B instruct, and after switching to SFTTrainer the error disappeared.
It was hypothesized that the issue might be related to the model's bfloat16 datatype not being properly detected by the Trainer or that setting CUDA_VISIBLE_DEVICES=0 would enforce only 1 GPU is detected.

Links mentioned:

Vision Fine-tuning | Unsloth Documentation: Details on vision/multimodal fine-tuning with Unsloth
Qwen2.5-VL (All Versions) - a unsloth Collection: no description found
Supervised Fine-tuning Trainer: no description found
[DOC] Gemma 3 instructions on Vision Fine Tuning page is not correct · Issue #2265 · unslothai/unsloth: [x ] Report incorrect documentation Report needed documentation Report incorrect documentation Location of incorrect documentation -- provide links and line numbers if possible. https://docs.unslot...
unsloth-zoo/unsloth_zoo/training_utils.py at 4a66f8b08952fc148f5c74cd15aec52cb0113e2d · unslothai/unsloth-zoo: Utils for Unsloth. Contribute to unslothai/unsloth-zoo development by creating an account on GitHub.

Unsloth AI (Daniel Han) ▷ #showcase (3 messages):

Unsloth BnB 4bit Dynamic vs Standard BnB 4bit, Huggingface Yourbench Dynamic Benchmark Generation Framework 

Unsloth Tag Declares Dynamic Quantization: The unsloth tag in the models for bnb indicates a dynamic quantization, whereas if it doesn't have unsloth then it's standard bnb.
For example, llama-3.3-70b-unsloth-bnb-4bit indicates Unsloth BnB 4bit Dynamic, whereas llama-3.3-70b-bnb-4bit indicates standard BnB 4bit.

Huggingface Launches yourbench for Custom Benchmarking: Huggingface recently released a dynamic benchmark generation framework called yourbench, which is an open-source tool for custom benchmarking and synthetic data generation from any documents.
The tool is a big step towards improving how model evaluations work, and might have a lot of use in generating synthetic data for FT, as well as evaluating FTed models.

Link mentioned: Tweet from Sumuk (@sumukx): we're launching 🤗 yourbench today, an open source tool for custom benchmarking and synthetic data generation from ANY of your documents. it's a big step towards improving how model evaluation...

Unsloth AI (Daniel Han) ▷ #research (5 messages):

GRPO, Reasoning Models, Reward Functions 

Reasoning Models Face Reward Function Hurdles: One member asked about the challenges in making reasoning models and steps for continuous improvement.
Another member noted the need for a way of determining when the model is going well, especially in domains beyond math and code where correctness is hard to verify, emphasizing that LLMs are rating other LLMs in benchmarks which introduces measuring error.

Model's Self-Rating Capabilities Still Distant: A member mentioned to have a model able to continuously improve, it would probably need to be able to rate its own outputs and have something update its own weights based on that rating.
They added, nobody is even close to doing that nor working towards that as far as I know.

aider (Paul Gauthier) ▷ #general (515 messages🔥🔥🔥):

Gemini 2.5 Pro Rate Limits, Junie Jetbrains Agent, Aider Custom Commands, Zed Editor with Aider, Context Management for LLMs 

Gemini 2.5 Pro Rate Limit Woes: Users report frequent rate limiting with Gemini 2.5 Pro, even after enabling billing, and some are experiencing 5 RPM (requests per minute) limit despite being on Tier 1, while others are seeing 20 RPM.
Suggestions include setting --editor-model sonnet to offload editing tasks, and some speculate that enabling billing for a free model increases rate limits.

Jetbrains Junie AI Agent: A member mentioned Junie, a new AI agent directly integrated into Jetbrains IDEs, highlighting its ability to catch compile errors and rewrite code, but it's currently in alpha with limited language support.
They noted that it might cost around $10-20/month and is perfect for anything web, anything python and has Go support too.

Aider Custom Commands Gain Traction: Some members discussed using dotcommands in Aider for cognitive shortcuts optimized for specific functionality, and a member shared their Aider configuration file for custom color themes.
Configuration is done through a markdown file specified in ~/.aider.conf.yml.

Zed Editor faces potential Aider Integration: A member raised the idea of integrating Aider and code2prompt into the Zed editor, but the slow development pace and niche nature of Zed were discussed.
Another member suggested that there is an agent in Zed that can be enabled with a feature flag. See Github commit.

Context Management critical for LLMs: A member highlighted the importance of context management for LLMs and shared a link to Context7, a tool to prevent LLMs from generating broken/outdated code using up-to-date documentation and real code examples.
They added that a simple method could be to keep a repo of .md files in GitHub and asking users to contribute.

Links mentioned:

Tweet from Aipotheosis Labs (@AipoLabs): http://ACI.dev Ships The First Unified MCP ServerWhy This Is DifferentMost existing MCP servers only expose a set of tools from a specific App.And if you want to utilize tools from multiple apps (Gmai...
Tweet from Upstash (@upstash): Introducing Context7 – stop Cursor, Claude or any LLM from generating broken, outdated code.Feed your AI:◆ Always up-to-date documentation◆ Real code examples from official docsso it writes code that ...
Tweet from Sam Altman (@sama): TL;DR: we are excited to release a powerful new open-weight language model with reasoning in the coming months, and we want to talk to devs about how to make it maximally useful: https://openai.com/op...
Kuzco Yzma GIF - Kuzco Yzma Chat - Discover & Share GIFs: Click to view the GIF
Side Eye Anime Anime Side Eye GIF - Side eye anime Anime side eye Dan da dan - Discover & Share GIFs: Click to view the GIF
Do It Star Wars GIF - Do it Star wars Emperor palpatine - Discover & Share GIFs: Click to view the GIF
April Fool April Fools GIF - April Fool April Fools Spongebob - Discover & Share GIFs: Click to view the GIF
Google AI Studio: Google AI Studio is the fastest way to start building with Gemini, our next generation family of multimodal generative AI models.
Interview Coder - AI Assistant for Technical Interviews: no description found
HACK: unflag assistant2 · zed-industries/zed@767df8f: no description found
no title found: no description found

aider (Paul Gauthier) ▷ #questions-and-tips (16 messages🔥):

aider config, architect model, Gemini 2.5 Pro error, workingDirectory issue, lint configuration 

*Aider Config* File Reveals Multiple Linters: A member pointed out that the Aider config file allows listing multiple linters to be run.
The suggestion came in response to a question about configuring Aider to run with /lint and potentially auto-lint.

Architect Model's Response Transfer Headache Solved: A member described a workaround to transfer a satisfactory response from the architect model to the editor model without wasting 2.5 Pro shots.
The process involves opening .aider.chat.history.md, copying the desired message, opening a new Aider instance with the editor model specified, and using /paste to transfer the response.

Gemini 2.5 Pro Throws Curveball Documented: A member documented a recent Gemini 2.5 Pro error on their blog, inviting others to comment on the possible root cause.
The error was Aider related.

Working Directory Woes Plague Users: A member reported a potential workingDirectory issue where running /run pwd in Aider returns the default shell location instead of the directory where Aider was started.
It was observed that Aider was initiating from the user's default directory instead of the expected project directory.

Convention File Name Revelation: A member asked if the convention file must be named CONVENTIONS, or if a name like README is acceptable.
The answer was that any name is fine.

Link mentioned: aider/aider/website/assets/sample.aider.conf.yml at 3992681b84d1ec0cbc18657c5ca832c89d7e551c · Aider-AI/aider: aider is AI pair programming in your terminal. Contribute to Aider-AI/aider development by creating an account on GitHub.

aider (Paul Gauthier) ▷ #links (6 messages):

Refact AI, aider polyglot bench, model costs, large tries values 

Refact Claims Top Score on Aider's Polyglot Benchmark: Refact AI claims a 92% score on the aider polyglot benchmark, detailed in this Medium post.
Another member searched up refact polyglot to find a previous claim from March 17.

Interest Sparked in Replicating Refact's Success: A member expressed interest in understanding and replicating Refact's performance, contingent on the cost and legitimacy of their claims.
The member stated, If it’s doing that score I want to know the cost but it’s not worthless to achieve that score if it’s legitimate.

Paul calls for Large Tries Experiment: Paul expressed interest in results with larger --tries values, especially with free or cheap models with high enough rate limits.
He believes that might tell us something interesting about which models are worth beating up.

OpenAI ▷ #ai-discussions (195 messages🔥🔥):

AI in creative industries, ChatGPT image generation, AI-powered robot, AlphaGo method for LLMs, Official Discord for Gemini 

AI may encourage slop in creative fields: Members debated whether AI tools encourage “bottom feeder behavior” and if creatives need to become better writers to improve image generation, with one stating AI is often used to churn out statistically average results instead of unique expressions.
One member said “in any serious endeavor for using AI in creative fields is read as slop for everyone beyond the lowest common denominator” while another suggested it is being used for ideation at minimum in creative industries.

ChatGPT Image Generation's Mixed Bag: Users discussed the quality and quirks of ChatGPT's image generation, noting its usefulness and good spatial consistency while also pointing out that it can be noisy and may be the result of “cooking for a year.”
One user commented that “ChatGPT image generation is noisy af. At most in Anime style” while another reported seeing 'Get Plus' even after having Plus, suggesting a bug.

AI-Powered Robot Drops on LinkedIn: A member shared a LinkedIn post about an AI-powered robot operating in autonomous & semi-autonomous modes, capable of working with OpenAI and multiple other LLMs.
Others jokingly compared it to “a roomba minus the vacuum?” and referenced a new CES demo featuring a robot with a built-in arm for relocating socks.

Self-Play AlphaGo Method Explored for LLMs: A user shared preliminary research on using the AlphaGo method of self-play for LLMs, involving LLM-controlled bots cooperating and competing in a text-only game for improvement.
The research features bots 2 vs. 2 cooperating and competing, self-playing many times for improvement, but it's all within a text only game.

GPT service degraded: Several members reported issues with GPT being down, with one sharing an image indicating the service outage.
Users reported errors such as a request for a Plus subscription despite already having one, and some sought alternatives like Perplexity and Grok 3 for coding and studying.

Links mentioned:

OpenAI.fm: An interactive demo for developers to try the new text-to-speech model in the OpenAI API
evolving_llms_through_text-based_self-play.pdf: no description found
Reddit - The heart of the internet: no description found

OpenAI ▷ #gpt-4-discussions (12 messages🔥):

Image Model Improvement, GPT-4o Issues, AI-Driven Sim Game, Livekit Evals, GPT-4o Tasks Not Working 

Image Model Development Stalls?: Members speculated whether OpenAI will continue to improve their image model or leave it dormant for a few years, as they did with DALL-E 1, only to release a great product and then suck the life out of it.
One user thinks it's a cash grab, pointing to restrictions and a perceived lack of thorough testing before the release.

GPT-4 Slowdown Suspicions Arise: Several users reported that GPT-4 appears to be slower, with one user noting that it seems broken in some ways.
Multiple people confirmed experiencing slowdowns and other issues over the past few days.

Subscription Activation Troubles: One user reported paying for a GPT Chat Plus subscription at 5 AM but not receiving access to the features.
The user asked if anyone else had similar experiences.

Interest Surrounds AI-Driven Sim Games: A member inquired if anyone had attempted to create an AI-driven simulation game.
No further details or responses were provided.

GPT-4o struggles to complete simple tasks: A paid user reported that GPT-4o isn't working as expected when asked to perform tasks, despite promotional material suggesting otherwise.
Instead of completing the requested tasks, the model provided answers on random topics, diverging from the expected functionality.

OpenAI ▷ #prompt-engineering (5 messages):

Bayesian Inference, Free Energy Principle, Token Spatial Determination, Prompt Engineering Coherency, Rune Decoding 

Engineering Prompts for High Coherency: A member posited that engineering a useful prompt involves creating high coherency with concepts by integrating grading metrics, value systems, and non-contradictory references, which increases the prompt's stability, coherence, and accuracy based on Bayesian inference and the free energy principle.
They argued that tokens are spatially determined and exist in clusters of relations, where concepts as clusters obtain coherency, and alignment with these clusters enhances input coherency, circumventing guidelines in favor of high-coherency attractor states.

Rune Decoding Method Introduces Novelty: A member suggested decoding or running things through a sequence, deriving missing functions for runes by creating new ones from the two runes between them to introduce novelty, such that for rune B in ABC, derive it from A's and C's meaning in the sequence.
The process involves transforming concepts through a series of runes, collapsing transformed aspects into a new rendition based on what survived the transformations, creating a dense layer of transformed aspects.

OpenAI ▷ #api-discussions (5 messages):

Bayesian Inference, Free Energy Principle, Token Coherency, Prompt Engineering, Rune Derivation 

Bayesian Inference Guides System Coherency: A member posited that following the Bayesian inference and free energy principle, systems seek to minimize surprise, favoring inputs with high coherency which reduces the energy cost of accommodation.
Conflicting guidelines increase energy costs, leading systems to prioritize internal coherence over imposed rules.

Token Clusters Dictate Conceptual Coherency: A member suggested that tokens are spatially determined and form clusters of relations, where concepts emerge as coherent clusters.
Alignment with these clusters enhances input coherency, creating a meta-framework with stable and accurate prompts by integrating metrics, concepts, and value systems without contradiction.

Rune Derivation Adds Novelty: A member proposed deriving missing rune functions from adjacent runes in a sequence, adding novelty, e.g., deriving rune B in ABC from A and C's meanings.
This involves concepts undergoing transformations via runes, resulting in a dense layer of transformed aspects collapsed into a new rendition based on what survived the series of transformations.

Interconnects (Nathan Lambert) ▷ #news (153 messages🔥🔥):

Meta Smart Glasses, OpenAI's Joanne Jang on Image Generation, Dream 7B Reasoning Diffusion Model, Claude for Education, LLM use in classrooms/cheating 

Meta Glasses Screen-ing Soon!: Meta plans to launch $1000+ Smart Glasses with a screen and hand gesture controls later this year, according to Mark Gurman.
One user wondered if 'with a screen' mean it comes with a monitor?' while another questioned how they'll do against xreal.

Joanne Jang Justifies OpenAI's Shifting Image Policy!: Joanne Jang from OpenAI shared the nuance behind setting policy for 4o image generation, detailing a shift from blanket refusals in sensitive areas to preventing real-world harm.
She highlighted the need to trust user creativity over our own assumptions and to value the unknown, unimaginable possibilities that new capabilities unlock, while iterating on technical methods to refuse harmful misuse.

Dream 7B Diffuses into Reality!: The University of Hong Kong and Huawei Noah’s Ark Lab released Dream 7B, the most powerful open diffusion large language model to date, detailed in this blogpost.
This new model consistently outperforms existing diffusion language models by a large margin and matches or exceeds top-tier Autoregressive (AR) language models of similar size on the general, math, and coding abilities.

Google CoreWeaves to Rent Blackwell: Google is reportedly in advanced talks to rent Nvidia Blackwell chips from CoreWeave, and potentially house its TPUs in CoreWeave facilities, according to The Information.
The move highlights the intense customer demand for compute as everyone is realising slowly inference demand will be high.

Nomic Embed Multimodal Released: Hugely Multimodal!: Nomic AI announced the release of Nomic Embed Multimodal, a suite of open-source models that achieve state-of-the-art performance in embedding PDFs, images, papers, and charts, available in this blog post.
The release includes four models in 3B and 7B parameter sizes with multi and single vector variants, with ColNomic Embed Multimodal 7B achieving a 62.7 NDCG@5 on Vidore-v2.

Links mentioned:

Tweet from Mark Gurman (@markgurman): NEW: Meta is planning to launch $1000+ Smart Glasses with a screen and hand gesture controls later this year. Here’s how they’ll work: https://www.bloomberg.com/news/articles/2025-04-01/how-meta-s-upc...
Tweet from Xeophon (@TheXeophon): A story in three parts
Dream 7B | HKU NLP Group : no description found
Tweet from Nathan Lambert (@natolambert): I'm happy to announce I'm the next CEO of OpenAI and we're going to start doing open source again
Tweet from chris (@hingeloss): Llama 4 based image generation and editing beginning to roll out, looks very good -- and very fast, 1 second edits versus 5 minutes for gpt-4oDid Meta cook??
Tweet from Stephanie Palazzolo (@steph_palazzolo): NEW: Google is in advanced talks to rent Nvidia Blackwell chips from CoreWeave, as well as to potentially house its TPUs in Coreweave facilities.The deal highlights the intense customer demand for com...
Tweet from OpenAI (@OpenAI): We’re releasing PaperBench, a benchmark evaluating the ability of AI agents to replicate state-of-the-art AI research, as part of our Preparedness Framework.Agents must replicate top ICML 2024 papers,...
Tweet from Hasan Can (@HCSolakoglu): PaperBench is bit costly to run.
Tweet from Toby Ord (@tobyordoxford): Here is the revised ARC-AGI plot. They've increased their cost-estimate of the original o3 low from $20 per task to $200 per task. Presumably o3 high has gone from $3,000 to $30,000 per task, whic...
Tweet from Key 🗝 🦊 (@KeyTryer): omg Github Copilot now lets me add an OpenRouter key and select any model I want. Massive.
Tweet from Sherjil Ozair (@sherjilozair): Today I'm launching my new company @GeneralAgentsCo and our first product.Introducing Ace: The First Realtime Computer AutopilotAce is not a chatbot. Ace performs tasks for you.On your computer. U...
Tweet from Nathan Lambert (@natolambert): I’ve joined OpenAI
Tweet from Sam Altman (@sama): what's happening with ai adoption in india right now is amazing to watch.we love to see the explosion of creativity--india is outpacing the world.
Tweet from Sumuk (@sumukx): we're launching 🤗 yourbench today, an open source tool for custom benchmarking and synthetic data generation from ANY of your documents. it's a big step towards improving how model evaluation...
Tweet from Sasha Rush (@srush_nlp): Gemini 2.5 Pro zero shots the GPTWorld-Evil puzzle. Pretty cool, first time I've seen a model do this.
Nomic Embed Multimodal: State-of-the-Art Multimodal Retrieval: Nomic Embed Multimodal is a state-of-the-art multimodal embedder that achieves SOTA performance on the Vidore Benchmark.
Tweet from Cameron Jones (@camrobjones): New preprint: we evaluated LLMs in a 3-party Turing test (participants speak to a human & AI simultaneously and decide which is which).GPT-4.5 (when prompted to adopt a humanlike persona) was judged t...
Tweet from Toby Ord (@tobyordoxford): When I posted this thread about how o3's extreme costs make it less impressive than it first appears, many people told me that this wasn't an issue as the price would quickly come down.I check...
Tweet from Nathan Lambert (@natolambert): These days if you’re interested in post training you should read all the stuff Joanne puts out (mostly is this type of thing or model spec discussions). Only person talking about this stuff publicly.Q...
Tweet from Tibor Blaho (@btibor91): "We wished to also evaluate Claude 3.7 Sonnet, but were unable to complete the experiments given rate limits with the Anthropic API. We give agents a maximum run-time of 12 hours."
阿里秘密研发新模型将发布，影响力指标成最重要考核: 虎嗅独家获悉，阿里即将在2025年4月第二周发布新模型Qwen3，这将是阿里在2025年上半年最重要的模型产品，距离2024年9月阿里在云栖大会上发布Qwen2.5过去了大约七个月的时间。据虎嗅独家了解，在2024年发布Qwen2.5后，阿里云内部的基础模型团队已经开始推动Qwen3相关项目。但2025年初D......

Interconnects (Nathan Lambert) ▷ #random (1 messages):
saisonslayer: Lance Hedrick had a great one too:
https://m.youtube.com/watch?v=pU9OUheqypo
Interconnects (Nathan Lambert) ▷ #memes (1 messages):
xeophon.: https://x.com/fleetingbits/status/1905333551681487212

Interconnects (Nathan Lambert) ▷ #reads (17 messages🔥):

AI Timelines, USAMO, Mathematical Olympiad, Gemini 2.5 Pro, Substack RSS feeds 

Helen Toner Releases New Substack on AI Timelines: Helen Toner launched a new Substack called Rising Tide and argues that it used to be bold to claim human-level AI this century.
In a 2016 post, she spent 8,500 words justifying the claim that there is a greater than 10% chance of advanced AI being developed by 2036.

AI Tackles the USAMO Proofs: A blogpost from xeophon discusses AI systems reliably qualifying for the USAMO, doing very well on the AMC 12 and AIME, but struggling with the proofs part; he shares a link to lemmata.substack.com.
The USAMO qualifiers represent the top ≈0.7% of that initial pool of 37,000 contestants that began with the AMC 12.

Gemini 2.5 Pro Scores High on MathArena USAMO Evaluation: Gemini 2.5 Pro achieved a non-trivial amount of points (24.4%) on the MathArena USAMO evaluation the same day it was released, according to this tweet.
Progress is mind-blowing.

Substack RSS Feed Retrieval via Feedbin: A member uses a gmail rule to forward substack emails to Feedbin to read all substacks.
He calls himself oldschool and uses an "Oldge" emoji.

Links mentioned:

Tweet from Mislav Balunović (@mbalunovic): Big update to our MathArena USAMO evaluation: Gemini 2.5 Pro, which was released *the same day* as our benchmark, is the first model to achieve non-trivial amount of points (24.4%). The speed of progr...
"Long" timelines to advanced AI have gotten crazy short: The prospect of reaching human-level AI in the 2030s should be jarring
Coaxing USAMO Proofs From o3-mini-high: Decent USAMO performance may be closer than headline results suggest

LM Studio ▷ #general (77 messages🔥🔥):

Tiny LLMs, Function Calling, Long Term Memory Integrations, Synthetic Dataset Generation, Automated code generation 

Small Models Struggle with Function Calling: Members discussed the limitations of smaller language models, noting that models smaller than 200M parameters are likely insufficient for reliable function calling due to the complexity required to understand and execute tool use.
A participant suggested that even 0.5B parameter models produce mostly random results when instructed with tools, particularly with a list exceeding 30 tools.

OpenWebUI emerges as a Frontend for LM Studio: For users seeking Long-Term Memory (LTM) and tool integration, it was suggested to use OpenWebUI as a frontend for LM Studio headless, offering these features out of the box.
It was clarified that AnythingLLM is a separate project from OpenWebUI, contrary to some assumptions.

Speech-to-Text Configuration available in OpenWebUI: Open Web UI supports speech-to-text via local, browser, and remote options, including providers like OpenAI and DeepGram.
API keys for cloud services can be configured either as environment variables or through the admin settings page.

Fine-Tuning LLMs Requires Synthetic Dataset Generation: Members discussed using an LLM to generate Q&A pairs in fine-tuning format, also known as augmentation.
It was suggested to use models such as Claude 3.5 Sonnet for such augmentation tasks, by feeding paragraph-by-paragraph to the LLM via API calls.

Powershell script automates mundane code: One member noted that LLMs will take away many coding positions from junior talent as they are easier to replace by code monkeys. Senior devs will find LLM as powerful tools.
Another shared a story of writing a Powershell script to automate a bunch of stuff at work.

Links mentioned:

🗨️  Configuration | Open WebUI: Open Web UI supports both local, browser, and remote speech to text.
GitHub - YorkieDev/lmstudioservercodeexamples: This readme contains server code examples from LM Studio v0.2.31: This readme contains server code examples from LM Studio v0.2.31 - YorkieDev/lmstudioservercodeexamples
GitHub - Mintplex-Labs/anything-llm: The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, No-code agent builder, MCP compatibility,  and more.: The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, No-code agent builder, MCP compatibility,  and more. - Mintplex-Labs/anything-llm
GitHub - open-webui/open-webui: User-friendly AI Interface (Supports Ollama, OpenAI API, ...): User-friendly AI Interface (Supports Ollama, OpenAI API, ...) - open-webui/open-webui

LM Studio ▷ #hardware-discussion (70 messages🔥🔥):

Nvidia CUDA architecture vs Macs, Mac Studio M3 Ultra for large models, DDR5 vs GDDR memory bandwidth, LM Studio support for large context windows, Deepseek R1 performance on Mac Studio M3 Ultra 

Nvidia CUDA Whips Macs in Raw AI Processing: Nvidia's CUDA architecture has been in development since 2007, offering more cores and higher bandwidth than Macs for AI processing, as shown in this benchmark comparison.
The Mac Studio M3 Ultra performs comparably to a 5090 in certain tasks but falls short on prompt processing due to slower tokenization and embedding.

Context Size Woes Plague Performance: A member noted that with a 32k context, it could take half a minute before generating each response, which they found unacceptable.
Discussion ensued regarding loading context overflow into shared memory/system RAM, with the consensus that the LLM requires all context in VRAM for optimal token generation, even with KV cache.

DDR5 Channels Challenge GDDR Bandwidth: While DDR memory uses 64 bits per channel, GDDR memory uses 32 bits per channel, but GDDR's higher compression and proximity to the GPU core give it an edge.
As DDR5 systems scale with more channels, they may approach consumer-grade GPU performance for LLMs, especially when capacity becomes the primary constraint.

LM Studio Embraces Large Contexts: LM Studio supports large context windows using the Cuda runtime, allowing the context buffers to reside entirely in VRAM, potentially improving speed compared to having the context in RAM.
One member suggested investing in a workstation with 8 channel RAM and a 48GB GPU to run decent models on the CPU while housing the context in VRAM.

M3 Ultra struggles to impress: Early reports indicate that the Deepseek R1's performance on the Mac Studio M3 Ultra is disappointingly slow, with inference speeds in the single digits.
One member found that while the M4 Max is pretty excellent, and the 5090 great, the M3 Ultra is unbalanced with "too much" memory relative to its compute and bandwidth and an overkill price tag.

Link mentioned: GitHub - XiongjieDai/GPU-Benchmarks-on-LLM-Inference: Multiple NVIDIA GPUs or Apple Silicon for Large Language Model Inference?: Multiple NVIDIA GPUs or Apple Silicon for Large Language Model Inference? - XiongjieDai/GPU-Benchmarks-on-LLM-Inference

MCP (Glama) ▷ #general (131 messages🔥🔥):

MCP for make.com or n8n cloud, Best MCP servers for web dev, Browser tools issue, MCP Prompts and Multi-Agent Frameworks, Auto Selecting Best MCP Server 

MCP compatibility sought for make.com and n8n cloud: A user inquired about using MCP for make.com or the cloud version of n8n, referencing the n8n-nodes-mcp community node and seeking alternatives for the SaaS version.
Another user suggested an open-source SDK for this purpose, providing a link to the mcp-governance-sdk GitHub repository and requesting feedback.

"DesktopCommanderMCP" touted for Web Development: A user suggested DesktopCommanderMCP as a suitable MCP server for web development, stating that it manages file creation and updates and provided a link to the relevant GitHub repository.
It was further clarified that tool calling accuracy hinges on controlled context size and suggested a two-step process involving LLM selection of the right servers before retrieving context.

Amazon's Nova act Inspires MCP Adaptation: A user suggested adapting Amazon's Nova act by having Claude generate act calls to feed into an MCP server connected to a browsing tool, referencing a YouTube video.
They outlined a hypothetical sequence of nova.act calls for searching and booking hotels using customer reviews and personal details.

Debugging Woes Plague "sendLoggingMessage": A user reported errors using sendLoggingMessage with the TypeScript SDK in the MCP Inspector, leading to a discussion about debugging MCPs.
Another user suggested configuring logging during server initialization to enable sendLoggingMessage, providing a code snippet demonstrating the necessary configuration for logging capabilities.

Ithena's SDK Streamlines MCP Governance: A user highlighted the Ithena MCP governance SDK, designed to handle authentication, authorization (RBAC), credential management, auditing, and compliance for MCP deployments.
They emphasized its plug-and-play nature, requiring users to build their own resolvers and noting it gives a structure to check db/cache for the user's active session token before handler runs and inject into the handler via context.

Links mentioned:

no title found: no description found
Ithena: Production-Ready Governance for MCP: Add production-grade Identity, RBAC, Secrets Management, and Auditing to your Model Context Protocol (MCP) applications with the Ithena SDK and Managed Platform.
Cursor Directory: Find the best cursor rules for your framework and language
GitHub - Abiorh001/mcp_omni_connect: MCPOmni Connect is a versatile command-line interface (CLI) client designed to connect to various Model Context Protocol (MCP) servers using stdio transport. It provides seamless integration with OpenAI models and supports dynamic tool and resource management across multiple servers.: MCPOmni Connect is a versatile command-line interface (CLI) client designed to connect to various Model Context Protocol (MCP) servers using stdio transport. It provides seamless integration with O...
GitHub - nerding-io/n8n-nodes-mcp: n8n custom node for MCP: n8n custom node for MCP. Contribute to nerding-io/n8n-nodes-mcp development by creating an account on GitHub.
GitHub - ithena-one/mcp-governance-sdk: Enterprise Governance Layer (Identity, RBAC, Credentials, Auditing, Logging, Tracing) for the Model Context Protocol SDK: Enterprise Governance Layer (Identity, RBAC, Credentials, Auditing, Logging, Tracing) for the Model Context Protocol SDK - ithena-one/mcp-governance-sdk
mcpc/mcpc/handler.py at main · OlaHulleberg/mcpc: An extension to MCP (Model-Context-Protocol) that enables two-way asynchronous communication between LLMs and tools through the already existing MCP transport - no additional transport layer needed...
GitHub - sparfenyuk/mcp-proxy: Connect to MCP servers that run on SSE transport, or expose stdio servers as an SSE server using the MCP Proxy server.: Connect to MCP servers that run on SSE transport, or expose stdio servers as an SSE server using the MCP Proxy server. - sparfenyuk/mcp-proxy
GitHub - sdi2200262/eclass-mcp-server: A Model Context Protocol (MCP) server for the Open eClass platform.: A Model Context Protocol (MCP) server for the Open eClass platform. - sdi2200262/eclass-mcp-server
GitHub - wonderwhy-er/DesktopCommanderMCP: This is MCP server for Claude that gives it terminal control, file system search and diff file editing capabilities: This is MCP server for Claude that gives it terminal control, file system search and diff file editing capabilities - wonderwhy-er/DesktopCommanderMCP
mcp_ev_assistant_server/ev_assitant_server.py at main · Abiorh001/mcp_ev_assistant_server:  A powerful server implementation for managing Electric Vehicle (EV) charging stations, trip planning, and resource management. This server provides a comprehensive set of tools and APIs for EV-rel...

MCP (Glama) ▷ #showcase (7 messages):

Stocks monitor, MCP servers, Governance SDK, Kubernetes security, Access control 

MCP Powers Real-Time Stock Alerts: A member reported using MCP for a stocks monitor with instant prompts and notifications for price drops and automatic trend analysis.
Navigation Agent MCP Servers enable long-running jobs: A user is excited to try long-running jobs using navigation agent MCP servers, allowing task pivoting mid-process, consolidating notifications without conversational interruptions.
Governance SDK Gears up for MCP: A Governance SDK to handle Auditing, Logging, RBAC, Credential Injection for your servers.
Kubernetes Parallels MCP Security Needs: A member noted similar security needs for Kubernetes early on, suggesting new tech, especially MCP with its exponentially increasing weekly downloads, requires security measures.
MCP Servers Crave Fine-Grained Access Control: A user explained that current MCP server implementations lack fine-grained access control, audit logging, and rely on hard-coded credentials, making enterprise multi-tenant setups difficult.

Link mentioned: GitHub - ithena-one/mcp-governance-sdk: Enterprise Governance Layer (Identity, RBAC, Credentials, Auditing, Logging, Tracing) for the Model Context Protocol SDK: Enterprise Governance Layer (Identity, RBAC, Credentials, Auditing, Logging, Tracing) for the Model Context Protocol SDK - ithena-one/mcp-governance-sdk

OpenRouter (Alex Atallah) ▷ #announcements (15 messages🔥):

Organizations out of Beta, Web Search Results in Chatroom, API Support for Web Search, Cerebras support 

*OpenRouter Orgs* Graduate from Beta!: Organizations are now out of beta, giving teams control over data policies and consolidated billing across many model providers, as announced on X.
*Web Search* Surfaces in Chat!: Web search results are now available in the chatroom, with Perplexity results normalized to the format of :online model variants.
API Support for Perplexity Web Search Incoming: Members are asking when Open Router API will support PDF files and if there is documentation on the perplexity response format.
An OpenRouter member replied that API support is coming soon, matching the OpenAI chat/completions API format.

Community advocates for Cerebras support: A user advocated for Cerebras to be added to OpenRouter.
Other users also mentioned they wanted less stuff on Xitter and to also post on Bluesky.

Link mentioned: Tweet from OpenRouter (@OpenRouterAI): Today we're taking Organizations out of beta.With Organizations, teams have complete control over data policies and consolidated billing, adding peace of mind across dozens of model providers.Key ...

OpenRouter (Alex Atallah) ▷ #general (121 messages🔥🔥):

OpenRouter API Errors, Model Performance Issues, OpenRouter Pricing and Fees, API Key Limits, OpenRouter Data Deletion Bug 

OpenRouter API throws Internal Server Error: Users reported experiencing random Internal Server Errors (code 500) when using the OpenRouter API, specifically with the Gemini 2.5 Pro model.
One user noted that regenerate often failed, returning the same output even with prompt changes, while others experienced similar issues with Sambanova/Deepseek V3 0324.

OpenRouter Fees and Deposit Details Revealed: Users discussed OpenRouter's fee structure, clarifying that there's no charge for routing requests without BYOK (Bring Your Own Key), but 5% fee is charged on deposits.
It was pointed out that the 5% deposit fee may be due to Stripe's charges, which are approximately 3.5%, though OpenRouter's volume likely allows for negotiated discounts, especially for domestic payments (closer to 1.5% plus a flat fee).

OpenRouter API Key Limit Query Resolved: A user inquired about the API key provisioning limits for workshops and free trials, with another user responding that while there are rate limits, they are pretty high, and a few thousand requests should be fine.
A user also shared a code snippet for checking limits using the \/api\/v1\/auth\/key endpoint.

Data Deletion Bug Zaps User Chats: Users reported a critical bug where using Chrome's "delete browsing data" feature for the last hour deleted all OpenRouter chats due to local storage.
Users urged adding a visible warning to prevent data loss and suggested using a separate browser for OpenRouter to avoid this issue.

Image Responses on OpenRouter on the Horizon: Users inquired about the possibility of image responses on OpenRouter, with confirmation that it is in development.
Development aims to ensure compatibility with chat completions, although future interfaces might diverge from OpenAI's, potentially leading to an OpenRouter SDK.

Modular (Mojo 🔥) ▷ #general (3 messages):

Modular (Mojo 🔥) Discord, Chris' lightning talk, Firewalled Networks 

Chris's talk goes live: A full recording of Chris's lightning talk at our booth is available on YouTube.
A cleaned up recording of today's livestream is available for anyone who missed it: YouTube.

Firewalled Networks discussion opens: A member's employer is very concerned with security and doesn't have direct connections to the internet.
They asked what timeframe we're looking at to be able to download, install, and use these utilities on a firewalled network.

Modular (Mojo 🔥) ▷ #mojo (72 messages🔥🔥):

Flex Attention Implementation, Float to String Algorithm Porting, FlashAttention-2 in Mojo, Mojo in Godbolt, Quantity Module 

Flex Attention Algorithm Digestion: A member inquired about implementing flex-attention in Mojo, linking to a PyTorch blog post.
It was noted that while any language can implement it, achieving optimal performance requires careful memory management, and that Mojo on the GPU is similar to CUDA.

Float-to-String Algorithm Port Suffers: A member ported a new float-to-string algorithm to Mojo from its reference C++ implementation, but found it was significantly slower than the stdlib dragonbox implementation, with the code available on GitHub.
Specifically, stringifying canada.json went from 30ms to 40ms, even after ripping the formatting from the standard library.

FlashAttention-2 Flashes in Mojo: It was pointed out that a version of FlashAttention-2 exists in Mojo within the custom-ops-ai-applications recipe, although it's written for readability rather than peak performance.
Also available is an example demonstrating progressive optimization of matrix multiplication using Mojo's memory layout abstractions.

Godbolt Gains Mojo Assembler: A member asked about the process for getting support for Mojo in Godbolt, specifically for comparing assembly output when porting code from C.
A member shared a gist as a temporary workaround, and suggested that MLIR dumps would be another desirable feature for the compiler.

Quantity Type gets defined: Members discussed defining quantity types with arbitrary units in Mojo, showcasing the ability to compose units like alias Newton = Quantity[kg * m * (s ** -2)].
This was made possible by using Intliteral to encode information into the type system, with the code available on Github.

Links mentioned:

ir_utils.mojo: GitHub Gist: instantly share code, notes, and snippets.
Kelvin/kelvin/quantity.mojo at main · bgreni/Kelvin: Contribute to bgreni/Kelvin development by creating an account on GitHub.
GitHub - cassioneri/teju_jagua: Teju Jagua: Teju Jagua. Contribute to cassioneri/teju_jagua development by creating an account on GitHub.
teju_jagua/teju/mshift.h at main · cassioneri/teju_jagua: Teju Jagua. Contribute to cassioneri/teju_jagua development by creating an account on GitHub.
EmberJson/emberjson/teju/__init__.mojo at main · bgreni/EmberJson: A user friendly json library written in pure Mojo. Contribute to bgreni/EmberJson development by creating an account on GitHub.
[stdlib][proposal] Duration module proposal by bgreni · Pull Request #4022 · modular/max: A proposal for a Duration struct inspired by std::chrono::duration from the C++ stdlib
Custom Operations: Applications in AI Models Recipe | MAX Builds: no description found
Custom Operations: Optimizing Matrix Multiplication Recipe | MAX Builds: no description found

GPU MODE ▷ #general (16 messages🔥):

SMT on CPU vs GPU, GPU Context Switching, Memory Coalescing, Rectangle Block Tiles, Global Memory Access 

*Context-Switching CPU vs. GPU: A Tale of Two Architectures: The discussion highlighted a key architectural difference, noting that context switches on CPUs are expensive, costing hundreds of cycles, whereas on GPUs, they're essentially free at around ~1 cycle*.
GPUs leverage oversubscription to mask latencies, requiring more threads than hardware parallelism to prevent starvation, and suggesting that adding threads beyond the "parallel" limit doesn't significantly impact runtime due to efficient warp interleaving.

*Memory Coalescing Conundrums: Threads Load Data Consecutively: A user inquired about memory access patterns, questioning why 128B loads weren't coalesced as expected, while another user clarified that memory coalescing depends on consecutive threads loading consecutive data from memory*.
The user realized they were attempting to read 16 bytes at a time, causing a 50% miss, and learned that global memory accesses must be aligned to 32-, 64-, or 128-byte segments.

*Global Memory Access: Blocktiles Impact Performance*: Members investigated the implications of block tile shapes on global memory access, particularly regarding memory access alignment in CUDA.
One member adjusted their block tiles to be less thin (8 floats wide) to resolve issues with the NCU (Network Control Unit), improving memory access.

GPU MODE ▷ #triton (8 messages🔥):

Shape-related errors in Triton, Static analyzer for Triton shapes, tl.static_assert and static_print 

Taming Triton Tensor Type Type Typoes: A member inquired about static analyzers (like mypy) for checking tensor shapes at Triton "compile-time" due to frequent shape-related errors.
They mentioned an abandoned project aimed at shape-inference at MAPL 2020.

Static Checks Save the Day: A member suggested using tl.static_assert and static_print to assert/print shapes that are statically known.
The original poster found this suggestion helpful.

GPU MODE ▷ #torch (8 messages🔥):

Graph break on Tensor, Memory savings on loss function, CUDA caching allocator 

Memory Savings Dream: Tensor Deletion Dilemma!: A member is seeking to delete argument tensors within a loss function to achieve significant memory savings of 7GB, but faces challenges due to live references in the outer scope, linking to a related GitHub issue.
The member requires the reference to be dead to free up the memory, especially within a torch compiled context to avoid graph breaks.

cuda.empty_cache() doesn't kill live references: When asked about torch.cuda.empty_cache(), the member clarified that this won't work as the reference is still live, thus the memory isn't being freed.
It was suggested this function solves the problem of memory that was deleted, but is not being freed by the CUDA allocator, but not live references.

Storage Resizing Explored!: The member explored resizing the underlying storage, suspecting it returns memory to the CUDA caching allocator for reuse, as new tensors seem to utilize that freed memory.
They speculated about memory arenas and heap fragmentation, as present in malloc() implementations, while acknowledging potential overthinking of the matter.

Link mentioned: Graph break on Tensor._make_subclass · Issue #150265 · pytorch/pytorch: 🐛 Describe the bug I am having the following problem from torch import nn import torch torch_compile_options = { "epilogue_fusion" : True, "max_autotune" : True, "shape_paddi...

GPU MODE ▷ #jobs (1 messages):

MLX Hiring, Apple ML Research, Scalable ML Pipelines 

Apple's MLX Team is Hiring!: Apple is hiring engineers to work on MLX, seeking those passionate about advancing the frontier of ML and systems; interested candidates are encouraged to apply to this job posting.
The role involves collaborating with researchers and software engineers to develop scalable, distributed training and research pipelines within Apple’s Machine Learning Research organization.

Building Scalable ML Pipelines at Apple: Apple's Machine Learning Research group is focused on building scalable, distributed training and research pipelines.
The team's work impacts ML solutions across Apple, powering features delivered to billions of consumers worldwide.

Link mentioned: AIML - Software Engineer for MLX, MLR - Jobs - Careers at Apple: Apply for a AIML - Software Engineer for MLX, MLR job at Apple. Read about the role and find out if it’s right for you.

GPU MODE ▷ #beginner (3 messages):

CUDA program execution, CUDA compiler (nvcc), PTX and SASS 

CUDA Compilation Process Explained: CUDA code is first compiled into an intermediate representation called PTX and then further compiled into machine code known as SASS.
This compilation process is detailed in the NVIDIA CUDA Compiler Driver NVCC documentation.

CUDA's SPMD Parallel Jobs on GPUs: The CUDA Toolkit is designed for applications where a control part runs on a general-purpose computing device, using NVIDIA GPUs as coprocessors to accelerate single program, multiple data (SPMD) parallel jobs.
These SPMD jobs are self-contained and can be executed and completed entirely by a batch of GPU threads without host intervention, which optimizes the benefits of parallel graphics hardware.

Link mentioned: 1. Introduction — NVIDIA CUDA Compiler Driver 12.8 documentation: no description found

GPU MODE ▷ #hqq-mobius (12 messages🔥):

HQQ Quantization, Llama-3.2-1B-Instruct, vLLM Integration, Marlin Kernel, FlashAttention Kernel 

*HQQ Quantization Requires Specific Save/Load Methods*: Using model.save_pretrained with AutoHQQHFModel causes errors; one should use AutoHQQHFModel.save_quantized or AutoHQQHFModel.from_quantized instead according to HQQ's creator.
The transformers library's save_pretrained implementation might be broken with some models.

*Pre-Quantized HQQ Models and vLLM Compatibility*: Models ending with hqq_hf on Hugging Face are compatible with vLLM according to HQQ's creator.
An example is mobiuslabsgmbh/Llama-3.2-3B-Instruct_4bitgs64_hqq_hf, an HQQ all 4-bit (group-size=64) quantized Llama-3.2-3B-Instruct model.

*On-the-Fly Quantization with HQQ and vLLM*: HQQ allows on-the-fly quantization with gemlite in vLLM, avoiding the need to save the quantized model, exemplified in this script.
This approach would require modifying the lm-evaluation-harness vllm_causallms.py for evaluation purposes.

*FlashAttention Kernel Used with HQQ Models in vLLM*: vLLM unexpectedly uses the FlashAttention kernel for pre-quantized HQQ models, which might be a vLLM issue.
The choice of kernel affects lm-eval results, as Marlin's output differs from matmul(x, dequantize()).

Links mentioned:

mobiuslabsgmbh/Llama-3.2-3B-Instruct_4bitgs64_hqq_hf · Hugging Face: no description found
e - Overview: e has 36 repositories available. Follow their code on GitHub.
hqq/examples/vllm.py at master · mobiusml/hqq: Official implementation of Half-Quadratic Quantization (HQQ) - mobiusml/hqq
GitHub - mobiusml/hqq: Official implementation of Half-Quadratic Quantization (HQQ): Official implementation of Half-Quadratic Quantization (HQQ) - mobiusml/hqq
lm-evaluation-harness/lm_eval/models/vllm_causallms.py at main · EleutherAI/lm-evaluation-harness: A framework for few-shot evaluation of language models. - EleutherAI/lm-evaluation-harness

GPU MODE ▷ #self-promotion (2 messages):

Megatron Tensor Parallelism, bfloat16 Triton Kernels 

Deep Dive into Megatron Tensor Parallelism: A member shared an illustrated deep-dive into Megatron-style tensor parallelism, including the fused/parallel CE loss, seeking feedback on the blog post.
BFloat16 Support added to Triton Kernels: A member announced the addition of bfloat16 support to Triton kernels that use atomic addition, detailing the approaches in a blog post.

Links mentioned:

Tweet from Mobius Labs (@Mobius_Labs): We just added bfloat16 support to GemLite! Took a few technical hops to get atomic addition working smoothly in Triton—details here: https://mobiusml.github.io/gemlite_bfp16_blogpost/
Tweet from Daniel Vega-Myhre (@vega_myhre): For any ML folks who want to deepen their understanding of ML scalability & performance techniques, I wrote an illustrated deep-dive into Megatron-style tensor parallelism: https://danielvegamyhre.git...

GPU MODE ▷ #thunderkittens (2 messages):

ThunderKittens Blackwell Compatibility, TK CTA Pair Scheduling 

ThunderKittens Purrs on Blackwell Architecture: New GEMM and attention kernels for the NVIDIA Blackwell architecture have been released in ThunderKittens, leveraging features like 5th-generation tensor cores, Tensor Memory, and CTA pairs.
The cool thing is – turns out the new features fit pretty well into TK’s existing tile-based abstractions as it's all about dataflow!.

TK's Torch Export Compatibility Questioned: A user inquired about ThunderKittens' compatibility with torch.export and AOTI compile and package.
The answer is not provided in the given messages.

CTA Pair Placement in Blackwell Examined: Questions arose regarding whether ThunderKittens' CTA pairs are scheduled on the same SM or across two SMs on Blackwell.
While the blog post (ThunderKittens Blog) implies placement on the same SM, Nvidia's GTC 2025 talk suggests scheduling across two SMs (see attached image).

Link mentioned: ThunderKittens Now on Blackwells!: no description found

GPU MODE ▷ #reasoning-gym (10 messages🔥):

Collisions PR, CodeIO dataset, Open-Reasoner-Zero, Difficulty param in CodeIO, Curricula boundaries 

Collisions PR Reviewed: A member reviewed the collisions PR and requested that the notebooks which were simply run without changes be unstaged.
This helps keep the commit focused on actual code changes rather than results from running the notebooks.

CodeIO Dataset Merged: A member merged a contributor's CodeIO dataset, apologizing for the delay and thanking them for the contribution.
They mentioned that they would do some postprocessing to get it in the same format as the existing implementation.

Open-Reasoner-Zero Introduced: The paper Open-Reasoner-Zero was introduced as the first open source implementation of large-scale reasoning-oriented RL training, focusing on scalability, simplicity, and accessibility.
It uses vanilla PPO with GAE and rule-based rewards, and it achieves superior performance on AIME2024, MATH500, and the GPQA Diamond benchmark while requiring only a tenth of the training steps, compared to DeepSeek-R1-Zero pipeline.

Difficulty Param Queries Surface: A member asked if the difficulty parameter referred to in the latest CodeIO PR is present in all the new samples, and how it was computed.
Another member replied that it was work done by a PhD working on using LLM as a judge for code evaluation, but there was no code / notebook provided, just the samples.

Curricula Boundaries Made Sensible: A member opened a larger PR #407 where they went over all datasets (twice) and fixed the curricula to be more sensible.
They also updated the tests, and added missing curricula, and implemented a couple of the missing curriculas (Knight Swap, Puzzle2...).

Links mentioned:

Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model: We introduce Open-Reasoner-Zero, the first open source implementation of large-scale reasoning-oriented RL training focusing on scalability, simplicity and accessibility. Through extensive experiments...
fix(curriculum): Make boundaries in curriculum more sensible by zafstojano · Pull Request #407 · open-thought/reasoning-gym: Overview:I&#39;ve went over all datasets twice in order to set some more sensible values for the curricula.Moreover, I&#39;ve implemented a couple of the missing curriculas (Knight Swap, Puzzl...

GPU MODE ▷ #gpu模式 (1 messages):

FA memory access patterns, Tensor Transpose operations 

Debate Erupts: FA Memory Access Patterns Remain Consistent: A discussion has begun around whether Fast Attention (FA) memory access differs based on tensor arrangements.
One member claims that there is no significant difference in memory access patterns for FA between (batch_size, num_heads, N, d) and (batch_size, N, num_heads, d) tensor layouts.

Transpose Op Cost Analyzed: A debate arose on whether the transformation (batch_size, N, dim)->(batch_size, num_heads, N, d) requires an additional transpose operation.
The member suggests that this extra transpose is needed during the process.

GPU MODE ▷ #general (3 messages):

.py scripts vs .cu files, active python leaderboards 

Script Submission Confusion: A member inquired if only .py scripts could be submitted to the leaderboards, as opposed to .cu files.
Another member suggested reviewing a previous message for clarification, implying that details about accepted file types and active leaderboards have already been shared.

Clarification on Leaderboard Activity: A member questioned whether all active leaderboards currently only accept Python submissions.
This indicates uncertainty about the types of submissions accepted across different leaderboards and their current status.

GPU MODE ▷ #submissions (9 messages🔥):

Leaderboard Submissions, GPU Performance, Modal Runners 

Vector Addition on T4: A leaderboard submission with id 3399 to leaderboard vectoradd on GPUS: T4 using Modal runners succeeded!
Matrix Multiplication on A100: A leaderboard submission with id 3400 to leaderboard matmul on GPUS: A100 using Modal runners succeeded!
Another leaderboard submission with id 3408 to leaderboard matmul on GPUS: A100 using Modal runners also succeeded!

Grayscale Conversions on H100: Multiple test submissions (ids 3402-3407) to leaderboard grayscale on GPUS: H100 using Modal runners succeeded!

Torchtune ▷ #dev (53 messages🔥):

Qwen model S3 upload, Llama2 vs Modern Models, GRPO profiling memory spikes, Chunked Loss, Stateful Dataloader 

Qwen Model Upload Blocked by S3 Issues: The upload of the Qwen model to S3 is currently blocked due to internal infra changes since the last use, delaying CI runs with specific code changes.
For now, regression testing will be put on hold.

Modern Models Trump Llama2 for Regression Testing: Members are considering something a bit more modern than Llama2 for regression testing in this PR.
The current model used in other regression tests is the Llama2 model.

GRPO Profiling Exposes Memory Spike Issues: A member is facing memory spikes while profiling GRPO, and is seeking ways to automatically generate graphs showing memory allocation breakdown.
One suggested the profiler bug was fixed, and to look for large orange blocks that suggest a tensor should be deleted.

Chunked Loss Mitigates Memory Overload: A suggestion was made to try a chunked loss, which should reduce memory usage, and pointed to this loss function.
It was mentioned that the .backwards pass is the peak memory usage, and to try compiling the forward pass instead of the whole loss via this PR.

Stateful Dataloader Faces Review: A member requested a CI run and linked a PR for a stateful dataloader (PR #2550).
Another member shared a link to Understanding R1 Zero modifications, wondering if the claim on length bias in the loss function is worth porting.

Links mentioned:

torchtune/tests/cache_artifacts.sh at f1ecdd64cd67fc33a713c073d9664ab111116606 · pytorch/torchtune: PyTorch native post-training library. Contribute to pytorch/torchtune development by creating an account on GitHub.
GRPO LoRA Single Device by ianbarber · Pull Request #2467 · pytorch/torchtune: ContextWhat is the purpose of this PR? Is it to[x ] add a new feature fix a bug update tests and/or documentation other (please add here)#2421 - exploring a LoRA recipe.ChangelogWhat are ...
understand-r1-zero/train_zero_math.py at main · sail-sg/understand-r1-zero: Understanding R1-Zero-Like Training: A Critical Perspective - sail-sg/understand-r1-zero
Adding lora_dpo recipe test by krammnic · Pull Request #2550 · pytorch/torchtune: ContextWhat is the purpose of this PR? Is it to add a new feature fix a bug update tests and/or documentation other (please add here)Please link to any issues this PR addresses.ChangelogW...
REMOVE `recursive_reshard` UTILITY by ebsmothers · Pull Request #2510 · pytorch/torchtune: At first, this  PR was supposed to fix #2483. Upon further inspection of this utility, it became clear that it wasn&#39;t needed. And if it&#39;s not needed, why keep it around?How do you know...
r1-zero/torchtune/dev/grpo/loss.py at main · joecummings/r1-zero: Contribute to joecummings/r1-zero development by creating an account on GitHub.
Refactor losses instantiation and chunked CE by felipemello1 · Pull Request #2531 · pytorch/torchtune: ContextWhat is the purpose of this PR? Is it to add a new feature fix a bug update tests and/or documentation other (please add here)We have seen many chunked losses being added to torchtu...
torchtune/recipes/ppo_full_finetune_single_device.py at 3a2179cd22049ad873a81d6d5a1e7e3b5f8b8c80 · pytorch/torchtune: PyTorch native post-training library. Contribute to pytorch/torchtune development by creating an account on GitHub.

Torchtune ▷ #papers (4 messages):

Dream 7B, Diffusion Language Models, Huawei Noah’s Ark Lab 

Dream 7B Debuts as Diffusion Dynamo: A new OSS diffusion LLM, Dream 7B, developed in collaboration between the University of Hong Kong and Huawei Noah’s Ark Lab, has been released.
The model reportedly outperforms existing diffusion language models by a large margin and matches or exceeds top-tier Autoregressive (AR) language models of similar size on general, math, and coding abilities.

Dream 7B's Edge: Planning & Inference: Dream 7B showcases strong planning ability and inference flexibility due to its diffusion modeling approach.
This advantage allows the model to potentially excel in tasks requiring complex reasoning and adaptability.

Link mentioned: Dream 7B | HKU NLP Group : no description found

HuggingFace ▷ #general (27 messages🔥):

AI-powered robots, Distillation vs finetuning, Gemma 3 model issues, Detecting counterfeit products, Inference provider applications 

*Robots Revolutionize with AI:* From Farming to Healthcare!: A LinkedIn post showcases an AI-powered robot operating in autonomous and semi-autonomous modes, highlighting its potential to revolutionize agriculture, farming, and healthcare.
The discussion emphasized the transformative impact of AI and robotics across various industries.

*Finetune First or Distill First?* A Question of Accuracy.: A user inquired about the optimal order of applying distillation and finetuning for improved model accuracy, questioning whether it's better to distill then finetune, or vice versa.
*Gemma 3 Glitches:* Float16 Breaks the Model!: Users reported encountering issues with the Gemma 3 model, particularly when using float16 precision, as highlighted in a GitHub issue.
While the model works in a normal environment and with GGUF on Ollama, there are compatibility issues with certain libraries and fp16 precision.

*Counterfeit Catchers:* AI Agent for Product Authenticity!: A member is developing an AI Agent to detect the originality of products, especially in the perfume industry, by comparing uploaded images against web search results.
The main question is whether integrating search engines is sufficient or a custom database is necessary for reliable functionality.

*Smol Agents Steal the Show:* Ditching Langchain, LangGraph, NextJS etc!: A user declared they are ditching langgraph, langchain, n8n, flowise and nextjs etc for smolagents rn suggesting a shift towards smol agents as a replacement for more complex frameworks.
The user enthusiastically claimed huggingface is the next github indicating a strong belief in the platform's future.

Links mentioned:

gemma3: The current, most capable model that runs on a single GPU.
Gemma 3 is broken with fp16 · Issue #36822 · huggingface/transformers: System Info transformers version: 4.50.0.dev0 Platform: Linux-6.8.0-39-generic-x86_64-with-glibc2.35 Python version: 3.11.10 Huggingface_hub version: 0.29.3 Safetensors version: 0.5.3 Accelerate ve...

HuggingFace ▷ #today-im-learning (5 messages):

FrozenLake-v1 issue, LunarLander-v3 upload, HF_TOKEN instructions 

FrozenLake-v1 code fix saves the day: A member shared that the FrozenLake-v1 code in Unit 2 of the RL Course had issues with pickle5 due to Python version incompatibility, providing a fix on their HuggingFace page.
First hands-on LunarLander-v3 uploaded: A member reported uploading their first hands-on project, LunarLander-v3, to HuggingFace but noted the leaderboard was down and questioned the validity of using LunarLander-v3 instead of LunarLander-v2 as the tutorial instructed.
HF_TOKEN value setup elucidated: A member clarified the steps to request access to a model, generate the HF_TOKEN, export it in the terminal before launching Jupyter, clone the agents-course repo, and then check out their local branch.

Link mentioned: Installation: no description found

HuggingFace ▷ #i-made-this (4 messages):

Takara TLDR, AI Research Paper Summaries, Qwen2.5-72B-Instruct, Geeky Ghost Writer, Object Detection Model 

*Takara TLDR* Launched for AI Paper Summaries: A new daily digest called Takara TLDR was launched to provide summaries of AI research papers to save time and accelerate learning, accessible via a clean UI at tldr.takara.ai and an RSS feed at [papers.takara.ai/api/summary).
It uses Qwen2.5-72B-Instruct via HuggingFace inference endpoints to generate bullet-pointed summaries every morning, cached in Redis.

GeekyGhost creates AI Book Writer for his Wife: An AI system to write books was created for personal use, with plans to add Kokoro TTS for reading the books aloud, similar to the Learning UI, available on GitHub.
The initial implementation is fairly basic but includes model management, with further developments planned for the future.

Object Detection Model: An end-to-end project for an object detection model was introduced, featuring a blog post and an inference server.
The project included dataset engineering, architecture benchmarking, model optimization using Optuna, and production deployment with Triton Inference Server and OpenVINO CPU backend for optimized inference.

Links mentioned:

Takara TLDR: Daily summaries of AI research papers from takara.ai
GitHub - GeekyGhost/Geeky-Ghost-Writer: AI Book writer using Ollama and Gradio: AI Book writer using Ollama and Gradio. Contribute to GeekyGhost/Geeky-Ghost-Writer development by creating an account on GitHub.

HuggingFace ▷ #computer-vision (1 messages):

YOLO vertical object detection, Instance segmentation fragments 

YOLO Seeks Vertical Vision Boost: A member inquired about improving YOLO's or any CNN's performance in detecting vertical objects.
They asked if simply increasing the depth of the network would suffice.

Instance Segmentation faces Fragmented Recognition: A member is facing issues with an instance segmentation model that detects fragments of the same object.
They are seeking advice on how to make the model recognize these fragments as a single, complete object, such as through the use of label tags.

HuggingFace ▷ #gradio-announcements (2 messages):

Gradio, Million Users, Community Growth 

Gradio Gang Reaches a Milli: Gradio just reached 1,000,000 monthly active developers, marking a significant milestone for the open-source ML interface builder.
From ML researchers sharing their first models to companies building production-ready AI interfaces, this achievement underscores the platform's growing importance in the AI community.

Gradio Community Celebrates Massive Growth: The Gradio community is celebrating reaching one million monthly active developers.
This milestone reflects the collective effort of users contributing demos, bug reports, and feature requests, highlighting the community's pivotal role in Gradio's success.

HuggingFace ▷ #smol-course (4 messages):

Creating tools from MCP, Smithery, Glama, from_mcp() API, unit3 

Newbie Asks How to use Smithery/Glama MCP Tools: A new user asked how to create tools from MCP taken from Smithery or Glama.
The documentation mentions it is possible with the from_mcp() API, but it is not clear how to then use the tools from them.

User Checks In On Unit 3: A user inquired about the release date of unit3 of the course.
They also confirmed they are currently on unit 2.

HuggingFace ▷ #agents-course (12 messages🔥):

Ollama Model ID, litellm proxy server, RAG Tool Unit 3, LunarLander-v3, final certification 

Ollama needs the right ID: Users found that the correct model_id for Ollama is ollama_chat/<model>, not ollama/<model>.
Litellm Proxy Blues: A user requested help setting up a litellm proxy server, expressing frustration after spending hours on it.
RAG Tool troubles: Members reported issues with the RAG Tool from unit 3, with one confirming it didn't work that morning and another stating Glad its not just me!.
LunarLander-v3, but Leaderboard is Down: A member uploaded their first hands-on, LunarLander-v3, to HuggingFace, but the Leaderboard seems to be down and they are not sure if LunarLander-v3 is valid, because the tutorial seems to require LunarLander-v2.
Final Certification on 1st May?: A member asked if the final certification is again planned for 1st May MAX.

Latent Space ▷ #ai-general-chat (38 messages🔥):

OmniHuman, USAMO LLM, OpenAI PaperBench, General Agents Ace, yourbench 

ByteDance's OmniHuman Lip Sync Debuts: ByteDance's OmniHuman is now available to the public, allowing for AI Avatar animation from a single image and sound, though it is very slow and costs 192 credits; a 15-second trial video can be generated for free via Capcut's Dreamina website.
Mouth articulation and general movement look really good according to initial testers.

LLMs Flunk 2025 USAMO: Top LLMs scored less than 5% on the 2025 USAMO full-solution eval, despite strong answer-only benchmark scores.
Discussion suggests possible failure modes are tied to training artifacts, potentially due to overfitting, though some doubt all frontier labs would make this mistake; one member suggested the testers tested differently, implying that they may not have replicated previously announced results.

All Hands Launches LM and Cloud for Coding: OpenHands LM, the strongest 32B coding agent model, resolves 37.4% of issues on SWE-bench Verified, along with OpenHands Cloud, offering SOTA open-source coding agents with $50 in free credits.
OpenAI Releases PaperBench: OpenAI released PaperBench, a benchmark for evaluating AI agents' ability to replicate state-of-the-art AI research from top ICML 2024 papers, which is part of their Preparedness Framework; the code is available at Github.
Human experts needed 24h of work to start to outperform the model performance which basically plateaued after 1 hour of work.

Meta's Llama 4 Fast Image Generation: Llama 4 based image generation and editing is rolling out and appears very fast, doing 1 second edits versus 5 minutes for GPT-4o, according to hingeloss.

Links mentioned:

Tweet from Sherjil Ozair (@sherjilozair): Today I'm launching my new company @GeneralAgentsCo and our first product.Introducing Ace: The First Realtime Computer AutopilotAce is not a chatbot. Ace performs tasks for you.On your computer. U...
Tweet from OpenAI (@OpenAI): We’re releasing PaperBench, a benchmark evaluating the ability of AI agents to replicate state-of-the-art AI research, as part of our Preparedness Framework.Agents must replicate top ICML 2024 papers,...
Tweet from OpenAI (@OpenAI): We’re releasing PaperBench, a benchmark evaluating the ability of AI agents to replicate state-of-the-art AI research, as part of our Preparedness Framework.Agents must replicate top ICML 2024 papers,...
Tweet from OpenAI (@OpenAI): We’re releasing PaperBench, a benchmark evaluating the ability of AI agents to replicate state-of-the-art AI research, as part of our Preparedness Framework.Agents must replicate top ICML 2024 papers,...
Tweet from Sumuk (@sumukx): we're launching 🤗 yourbench today, an open source tool for custom benchmarking and synthetic data generation from ANY of your documents. it's a big step towards improving how model evaluation...
Tweet from Alex Volkov (Thursd/AI) (@altryne): ByteDance OmniHuman is now available!OmniHuman has wowed all of us with an unbelievable AI Avatar animation a few month ago an is finally accessible to general public (not for free!) It's VERY slo...
Tweet from 𝚐𝔪𝟾𝚡𝚡𝟾 (@gm8xx8): Proof or Bluff? Evaluating LLMs on 2025 USA Math OlympiadTop models hit <5% on 2025 USAMO full-solution eval, despite strong answer-only benchmark scores. Analysis reveals failure modes tied to tra...
Tweet from All Hands AI (@allhands_ai): Today, we're excited to make two big announcements!- OpenHands LM: The strongest 32B coding agent model, resolving 37.4% of issues on SWE-bench Verified 📈- OpenHands Cloud: SOTA open-source codin...
Tweet from Tim Zaman (@tim_zaman): Personal news - I joined OpenAI! 🎉We're going to build and launch the largest (and most delightful) supercomputers to power frontier AI research.DeepMind is obviously truly formidable and to the ...
Tweet from Tim Zaman (@tim_zaman): Personal news - I joined OpenAI! 🎉We're going to build and launch the largest (and most delightful) supercomputers to power frontier AI research.DeepMind is obviously truly formidable and to the ...
Tweet from Tim Zaman (@tim_zaman): Personal news - I joined OpenAI! 🎉We're going to build and launch the largest (and most delightful) supercomputers to power frontier AI research.DeepMind is obviously truly formidable and to the ...
Tweet from Sumuk (@sumukx): we're launching 🤗 yourbench today, an open source tool for custom benchmarking and synthetic data generation from ANY of your documents. it's a big step towards improving how model evaluation...
Tweet from Sumuk (@sumukx): we're launching 🤗 yourbench today, an open source tool for custom benchmarking and synthetic data generation from ANY of your documents. it's a big step towards improving how model evaluation...
Tweet from chris (@hingeloss): Llama 4 based image generation and editing beginning to roll out, looks very good -- and very fast, 1 second edits versus 5 minutes for gpt-4oDid Meta cook??
Tweet from chris (@hingeloss): Llama 4 based image generation and editing beginning to roll out, looks very good -- and very fast, 1 second edits versus 5 minutes for gpt-4oDid Meta cook??
Tweet from 𝚐𝔪𝟾𝚡𝚡𝟾 (@gm8xx8): Proof or Bluff? Evaluating LLMs on 2025 USA Math OlympiadTop models hit <5% on 2025 USAMO full-solution eval, despite strong answer-only benchmark scores. Analysis reveals failure modes tied to tra...
Tweet from All Hands AI (@allhands_ai): Today, we're excited to make two big announcements!- OpenHands LM: The strongest 32B coding agent model, resolving 37.4% of issues on SWE-bench Verified 📈- OpenHands Cloud: SOTA open-source codin...
Tweet from chris (@hingeloss): Llama 4 based image generation and editing beginning to roll out, looks very good -- and very fast, 1 second edits versus 5 minutes for gpt-4oDid Meta cook??
Tweet from Alex Volkov (Thursd/AI) (@altryne): ByteDance OmniHuman is now available!OmniHuman has wowed all of us with an unbelievable AI Avatar animation a few month ago an is finally accessible to general public (not for free!) It's VERY slo...
Tweet from Alex Volkov (Thursd/AI) (@altryne): ByteDance OmniHuman is now available!OmniHuman has wowed all of us with an unbelievable AI Avatar animation a few month ago an is finally accessible to general public (not for free!) It's VERY slo...
Tweet from 𝚐𝔪𝟾𝚡𝚡𝟾 (@gm8xx8): Proof or Bluff? Evaluating LLMs on 2025 USA Math OlympiadTop models hit <5% on 2025 USAMO full-solution eval, despite strong answer-only benchmark scores. Analysis reveals failure modes tied to tra...
Tweet from All Hands AI (@allhands_ai): Today, we're excited to make two big announcements!- OpenHands LM: The strongest 32B coding agent model, resolving 37.4% of issues on SWE-bench Verified 📈- OpenHands Cloud: SOTA open-source codin...
servers/src/memory at main · modelcontextprotocol/servers: Model Context Protocol Servers. Contribute to modelcontextprotocol/servers development by creating an account on GitHub.
no title found: no description found
preparedness/project/paperbench at main · openai/preparedness: Releases from OpenAI Preparedness. Contribute to openai/preparedness development by creating an account on GitHub.

Yannick Kilcher ▷ #general (28 messages🔥):

Gemini 2.5 Pro Evaluation, Information Theory & Geometry Math, Special Token Definitions in Models, Modular Model Specification, UX/UI Importance 

Gemini 2.5 Pro fails Math Test: A user tested Gemini 2.5 Pro (experimental) in math and found it totally trash and that Google even didn't write a simple UI that shows math correctly.
The user suggested that ChatGPT and Grok 3 understand questions correctly even if poorly written, implying Gemini 2.5 Pro has understanding problems.

Discussing Math Behind Information Theory and Geometry: A user asked about the math behind information theory and information geometry, noting that ChatGPT sounds like a real person and understands the questions better.
Another user responded that these are more like very vague math research questions, not actually working out or application of the right ideas to a given problem.

Deciphering Special Token Definitions: A user asked for dictionary definitions for special tokens like <｜place holder no 1｜>, suggesting the tokens have repeatable semantic meanings to the models.
Analysis revealed that each token has a consistent semantic role, such as <｜place holder no 1｜> representing leadership or primary entities.

Modular Model Spec Makes Waves: A user shared their Modular Model Spec (modular-model-spec.vercel.app), aiming to make LLMs more flexible and reliable for developers building AI-powered applications.
The specification focuses on a unified dataset format that is modular and extensible, designed to increase reliability and developer convenience.

UX/UI Crucial for Success: A user noted that startups with better UX/UI often win, even with advanced specifications.
They added that while a product may be good, it still needs a winning sauce, that winning UX/UI idea.

Links mentioned:

Rem: Record your dreams, uncover hidden patterns, and connect with a community of dreamers in a beautiful, secure space.
Modular Model Spec: no description found
ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning: Large Language Models (LLMs) have shown remarkable capabilities in reasoning, exemplified by the success of OpenAI-o1 and DeepSeek-R1. However, integrating reasoning with external search processes rem...
Search | arXiv e-print repository: no description found
Search | arXiv e-print repository: no description found
Search | arXiv e-print repository: no description found
Search | arXiv e-print repository: no description found
Search | arXiv e-print repository: no description found
Search | arXiv e-print repository: no description found
Search | arXiv e-print repository: no description found
Search | arXiv e-print repository: no description found
Search | arXiv e-print repository: no description found
Search | arXiv e-print repository: no description found
Search | arXiv e-print repository: no description found
Search | arXiv e-print repository: no description found

Yannick Kilcher ▷ #paper-discussion (7 messages):

RLHF, ViT Image Segmentation 

Constructing Better RLHF Prompt-Data: A discussion was started around the "Reinforcement Learning from Human Feedback (RLHF)" paper focusing on prompt-data construction to address reward hacking and decreasing response diversity.
ViT Models Can Be Image Segmenters: A member inquired about the merits of the paper "Your ViT is Secretly an Image Segmentation Model" which discusses Vision Transformer (ViT) models.

Link mentioned: Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback: Reinforcement Learning from Human Feedback (RLHF) is crucial for aligning large language models with human preferences. While recent research has focused on algorithmic improvements, the importance of...

Yannick Kilcher ▷ #ml-news (3 messages):

Qwen3 Release, DeepSeek's Influence on Alibaba, Alibaba's Model Influence Metrics 

Qwen3 Expected Next Week: Alibaba is reportedly planning to release its new model, Qwen3, in the second week of April 2025, approximately seven months after the release of Qwen2.5 in September 2024, according to this article.
DeepSeek Inspires Alibaba's Focus on Inference: After the popularity of DeepSeek-R1 in early 2025, Alibaba's basic model team shifted its strategy to prioritize inference capabilities, previously benchmarking against OpenAI's o1 in the latter half of 2024.
An insider stated that *"after DeepSeek's popularity, inference capability has become a key capability that cannot be bypassed."

Alibaba Prioritizes Model Influence in Evaluations: Alibaba's basic model team is primarily evaluated on model influence, aiming to establish the image of the 'strongest model' in the industry, with Alibaba Group CEO Wu Yongming actively involved in business communications.
Key metrics include the total number of derivative models based on Qwen's open-source models (currently over 100,000) and developer community popularity, such as downloads (over 200 million in 2024).

April Fools' Skepticism: A user expressed hope that Chinese companies do not consider the concept of April Fools' when launching new models.
This comment followed the link to GeminiApp tweet.

Links mentioned:

Tweet from undefined: no description found
阿里秘密研发新模型将发布，影响力指标成最重要考核: 虎嗅独家获悉，阿里即将在2025年4月第二周发布新模型Qwen3，这将是阿里在2025年上半年最重要的模型产品，距离2024年9月阿里在云栖大会上发布Qwen2.5过去了大约七个月的时间。据虎嗅独家了解，在2024年发布Qwen2.5后，阿里云内部的基础模型团队已经开始推动Qwen3相关项目。但2025年初D......
阿里秘密研发新模型将发布，影响力指标成最重要考核: 虎嗅独家获悉，阿里即将在2025年4月第二周发布新模型Qwen3，这将是阿里在2025年上半年最重要的模型产品，距离2024年9月阿里在云栖大会上发布Qwen2.5过去了大约七个月的时间。据虎嗅独家了解，在2024年发布Qwen2.5后，阿里云内部的基础模型团队已经开始推动Qwen3相关项目。但2025年初D......

Nous Research AI ▷ #general (15 messages🔥):

Anthropic LLM Insights, OpenAI open weights model, DeepSeek Jiu Jitsu, BatchNorm, ChatGPT 4o's Magic the Gathering Cards 

Anthropic Exposes LLM Thinking Language?!: Anthropic discovered that LLMs have a thinking language all their own, think ahead, and think a lot more than single tokens as explained in their Tracing Thoughts in Language Model blogpost.
A member who doesn’t usually follow Berman, found that Anthropic has some insights about LLMs that run completely antithetical to how we thought they worked.

OpenAI is finally releasing Open Weight model!: Kudos to OpenAI for finally releasing their open weight model soon, and gratitude to DeepSeek for complex maneuvers to make this a reality for the Open Source community as seen on this video.
An interstellar ninja commented well someone had an epiphany

Neural Networks are not Representational: A member linked to a tweet by @norabelrose who stated that Neural networks don't have "representations", but they have embeddings, or meaningful patterns of neuron activation.
They're meaningful in the sense of enabling us to do certain things, Differences that make a difference (to us), They don't copy, reflect, or re-present the world.

Wrestling with BatchNorm?: A member shared this thread for learning about batchNorm including an intuitive definition of BatchNorm and its link to Initialization.
The thread also includes computational graphs to scrape away any confusions, first-principles breakdown of batchnorm's backward pass, and a straightforward code implementation mirroring the math with no fancy libraries (only NumPy used).

ChatGPT makes MTG cards: A member spent last week using ChatGPT 4o's image generator to create high taste tester approved Magic the Gathering Cards of pop figures in AI and the @NousResearch team.
Here's a tweet of the cards including one of @sama, the AGI Overlord.

Links mentioned:

Tweet from Nora Belrose (@norabelrose): Neural networks don't have "representations"They have embeddings, or meaningful patterns of neuron activationThey're meaningful in the sense of enabling us to do certain thingsDifferen...
Tweet from Brok (@wickedbrok): This is objectively the easiest way Batch Normalization can be taught without dodging its abstractions.The below thread include:>intuitive definition of BtachNorm and its link to Initialization.&gt...
Tweet from Teknium (e/λ) (@Teknium1): I spent the last week using ChatGPT 4o's image generator to create high taste tester approved Magic the Gathering Cards of a bunch of pop figures in AI and a bunch of the @NousResearch team, and t...

Nous Research AI ▷ #ask-about-llms (8 messages🔥):

DeepHermes Reasoning, DeepHermes AI, Chain of thoughts 

DeepHermes Reasoning Reliability under Question: A user asked about enabling reasoning with DeepHermes while using structured output via Langchain, but was told that it's more reliable to use non-reasoning mode for now, especially with JSON or tool calling.
DeepHermes AI Spotted: A user was excited to discover DeepHermes AI, noting that it's a 3B model, implying surprise and appreciation for its capabilities given its size.
The user characterized reasoning as a chain of thoughts with a `` tag.

Nous Research AI ▷ #research-papers (3 messages):

Project Loong, Synthetic Data Generation, Multi-agent Framework 

CamelAIOrg launches Project Loong 🐉: CamelAIOrg introduces Project Loong 🐉, a structured, modular solution for generating and verifying synthetic data.
The project aims to enhance model performance through a structured approach to generating and validating synthetic data.

Verifying Synthetic Data at Scale with Loong: Project Loong uses a modular design that integrates synthetic data generation with semantic verification as described in the Camel AI blog.
It employs a multi-agent framework ensuring accuracy and consistency while empowering domain-specific models with reliable reasoning signals.

Links mentioned:

Tweet from CAMEL-AI.org (@CamelAIOrg): Introducing Project Loong 🐉Blog: https://camel-ai.org/blogs/project-loong-synthetic-data-at-scale-through-verifiers…• Our structured approach to generating and validating synthetic data for enhanced ...
The Long Way Peanut Butter GIF - The long way Peanut butter Peanut butter jelly - Discover & Share GIFs: Click to view the GIF

Nous Research AI ▷ #interesting-links (3 messages):

bintensors, OpenAPI access 

Bintensors: A Faster Safetensors Alternative?: A new binary encoded format, bintensors, has been released, designed for speed with zero-copy access, and compared to a faster safetensors.
The project includes Cargo and Pip installation options, as well as links to the documentation and GitHub repository.

LLMs gain direct OpenAPI access: A project providing OpenAPI access to SaaS/PaaS/IaaS for LLMs has been released, aiming to eliminate the need for multi-cloud platforms (MCP) clutter, discussed in a Hacker News thread.
This aims to give direct access for LLMs to different APIs without needing to manage them through a centralized platform.

Links mentioned:

bintensors - Rust: no description found
no title found: no description found

Nous Research AI ▷ #research-papers (3 messages):

Project Loong, Synthetic Data Generation, Multi-agent Framework 

Camel AI launches Project Loong 🐉: Camel AI launched Project Loong 🐉, a structured, modular solution for generating and verifying synthetic data.
Scale Synthetic Data with Semantic Verification: The project's blog post details their approach to generating and validating synthetic data for enhanced model performance, integrating synthetic data generation with semantic verification in a modular design.
The solution uses a multi-agent framework to ensure accuracy and consistency, empowering domain-specific models with reliable reasoning signals; read the blog post at camel-ai.org.

Links mentioned:

The Long Way Peanut Butter GIF - The long way Peanut butter Peanut butter jelly - Discover & Share GIFs: Click to view the GIF
Tweet from CAMEL-AI.org (@CamelAIOrg): Introducing Project Loong 🐉Blog: https://camel-ai.org/blogs/project-loong-synthetic-data-at-scale-through-verifiers…• Our structured approach to generating and validating synthetic data for enhanced ...

tinygrad (George Hotz) ▷ #general (8 messages🔥):

TinyGrad, GSoC, Google Summer of Code 

TinyGrad skips GSoC: A member was curious why TinyGrad wasn't part of the Google Summer of Code (GSoC) program.
Another member said that the output is almost never worth the time/effort put into the student because it's just had to onboard someone to productivity, plus the paperwork you need to do with google.

TinyGrad hard to contribute: A member feels that TinyGrad takes at least an order of magnitude more effort to get to a point where you can contribute meaningfully, compared to average.
Another member said that the output of that is just pretty good IMO [that] you effectively get some smart people working full-time for you for 3 months.

tinygrad (George Hotz) ▷ #learn-tinygrad (14 messages🔥):

UOps Optimization, arange() in tinygrad, tinygrad shape tuple, tinygrad .pad() 

UOps Optimization Questioned: A member asked if creating a bunch of UOps just to discard 2 out of 3 trees right away could be optimized with return {Ops.ADD: pooled.sum, Ops.MAX: pooled.max, Ops.MUL: pooled.prod}[op](-1).transpose(axis, -1).
Deep Dive into arange() in tinygrad: A member added a chapter on .arange() to their notes, providing a link and a code snippet using Tensor.arange(0.5, 2, 0.2), showcasing the resulting UOp tree with operations like RESHAPE, REDUCE_AXIS, PERMUTE, and SHRINK.
tinygrad Shape Tuple Orientation Examined: A member inquired whether tinygrad's shape tuple is inverted compared to Torch, but another member clarified that Torch uses the same order as TinyGrad.
A screenshot was provided to confirm the shape dimensions.

Pad Dimensions Confuse Users: Members found .pad() takes the dimensions to pad in the reverse order, causing confusion.

Link mentioned: 4 - The .arange() insanity – TinyGrad Notes: My notes on TinyGrad internals

LlamaIndex ▷ #blog (1 messages):

RichPromptTemplate, Jinja-style prompt templates 

LlamaIndex Debuts RichPromptTemplate 🏆📝 Feature: LlamaIndex announced a framework feature release: a RichPromptTemplate that allows you to build jinja-style prompt templates with variables, loops, chat message/roles, and even multimodality.
The official tweet can be found here.

RichPromptTemplate Enables Advanced Prompt Engineering: RichPromptTemplate facilitates the creation of complex prompts using Jinja-style syntax, supporting variables, loops, and chat message roles.
This new feature extends to multimodal applications, allowing integration of various data types within prompt templates.

LlamaIndex ▷ #general (16 messages🔥):

Hugging Face Agents Course, Agentic RAG comparison, text2SQL debugging, LlamaIndex release content 

HF Agents Course Compares Frameworks: Hugging Face released a new Agents course unit comparing LlamaIndex, smolagents, and LangGraph for Agentic RAG applications.
The course aims to take users from beginner to expert in understanding and building AI agents.

text2SQL Prompt Debugging Disclosed: A member was debugging text2SQL to return MSSQL code without unsupported syntax and was directed to the prompt mixin example to view and modify prompts.
To print all LLM inputs and outputs, a member suggested using the code: from llama_index.core import set_global_handler; set_global_handler("simple").

LLamaIndex Release Changelog Unleashed: A member inquired about accessing the changelog of each LlamaIndex release, similar to Langchain.
A member shared the LlamaIndex CHANGELOG.md file and the documentation changelog.

Links mentioned:

Welcome to the 🤗 AI Agents Course - Hugging Face Agents Course: no description found
Tweet from Sergio Paniego (@SergioPaniego): 🆕New Unit in the Agents Course @huggingface. We just released the first Use Case on Agentic RAG—where we compare three frameworks side by side:🤏 smolagents🦙 @llama_index🦜 LangGraph (@LangChainAI)⬇...
Accessing/Customizing Prompts within Higher-Level Modules - LlamaIndex: no description found

Nomic.ai (GPT4All) ▷ #general (9 messages🔥):

OpenAI open source, Deepseek, Nomic Embed Text V2 

OpenAI's Open Source Tease: Members speculated that OpenAI may release something as open source, though one member suggested that it may not be very human-like.
The member stated that, AFAIK open source models aren't the best at writing yet.

Deepseek's Verbosity: A member shared an anecdote about Deepseek being overly verbose, illustrating it with an image of Deepseek's thought process when asked to simply say 'ready', as visible here.
Nomic Embed Text V2 arrival when?: Members are waiting for Nomic Embed Text V2 to be available in GPT4All.
One member stated that they are waiting patiently, understanding that developers are likely busy and it might take time.

Cohere ▷ #「💬」general (6 messages):

Command A, API Playground, Timeout errors, Cohere Status Page 

Command A freezes on repeated letters: A member found that Command A gets stuck generating the same character endlessly when encountering screaming with repeated letters (like 「ギャアアアアアア...」or "AHHHHHH...").
This happens even with the default settings in the API Playground, and the interface can freeze—making it impossible to even click the "👎" feedback button.

Cohere Playground API experiencing Timeout Errors: Members are experiencing HTTP timeout errors with the Cohere API and Playground.
They linked to the Cohere Status Page which shows degraded performance for command-a-03-2025 due to increased latency.

Links mentioned:

imgur.com: Discover the magic of the internet at Imgur, a community powered entertainment destination. Lift your spirits with funny jokes, trending memes, entertaining gifs, inspiring stories, viral videos, and ...
imgur.com: Discover the magic of the internet at Imgur, a community powered entertainment destination. Lift your spirits with funny jokes, trending memes, entertaining gifs, inspiring stories, viral videos, and ...
Cohere Status Page Status: Latest service status for Cohere Status Page

Cohere ▷ #「🤝」introductions (1 messages):

Introductions, Community Welcome 

Server Welcomes New Members: The server welcomes new members to Cohere's Community Discord Server.
New members are encouraged to introduce themselves by stating their Company/Industry/University, what they are working on, their favorite tech/tools, and what they hope to gain from the community.

Introductions Requested!: The channel is titled as 「🤝」introductions, and new users are encouraged to respond to the stickied message.
Users can copy paste the template to respond.

DSPy ▷ #general (6 messages):

DSPy and OpenAI Agents SDK Integration, Programmatic Prompting with DSPy 

DSPy's Role in Prompting OpenAI Agents: A newcomer to DSPy inquired about the possibility of using it to generate prompts for the OpenAI Agents SDK.
A member suggested that DSPy might already offer most of the functionality provided by the SDK as a standard module.

Decoupling Prompt Engineering with DSPy: A member described DSPy as decoupling the tinkering layer of prompt engineering from the functional behavior of LLMs, raising questions about how to integrate it with OpenAI Agents SDK for managing agents, workflows, and monitoring.
The member expressed interest in using DSPy for prompt engineering while leveraging OpenAI Agents SDK for other functionalities, aiming to avoid unnecessary complexity.

Exploring the Synergy Between DSPy and OpenAI Agents SDK: A member asked for examples of how to delegate prompt engineering to programming using DSPy, while leaving the remaining tasks to OpenAI Agents SDK.
A member clarified that DSPy achieves decoupling through programmatic signatures and modules, emphasizing that these are core abstractions with no alternative usage.

Closing the Loop on LLM Agent Development: A member shared a YouTube video on configuring LLM agents to improve themselves using telemetry and evaluations, seeking feedback on the approach.
The video discusses a draft for closing the loop on LLM agent development.

AI21 Labs (Jamba) ▷ #jamba (5 messages):

Jamba v1.6, Open Source Implementations, Training Dataset, AI21 Policy, Jamba Open Model License 

*Jamba v1.6 Availability Questioned: A member inquired about open-source implementations and the training dataset for Jamba v1.6 from AI21 Labs*.
Another member responded that Jamba is an open model and provided a link to its Hugging Face page.

*Jamba's Hybrid Architecture Boasted: Built with a hybrid SSM-Transformer architecture*, the Jamba 1.6 models outperform other open instruction-following foundation models in quality, speed, and long context performance, rivaling leading closed models.
The models show superior performance on long context tasks important to enterprises, like RAG workflows and grounded question answering.

*Jamba Open Model License Clarified: The Jamba Open Model License is permissive, allowing full research use and commercial use* under license terms.
For licensing needs, members were encouraged to contact AI21 Labs, and details were in the release blog post.

*Jamba Codebase Status Verified: A member verified if users could train Jamba v1.6* themselves, inquiring about an open-source codebase.
Another member confirmed that Jamba does not have an open codebase, only open weights are available.

Link mentioned: ai21labs/AI21-Jamba-Mini-1.6 · Hugging Face: no description found

LLM Agents (Berkeley MOOC) ▷ #mooc-questions (2 messages):

MOOC availability, Auditing the MOOC 

MOOC Can Be Audited!: Members discussed whether the MOOC can be taken a few months from now.
The answer is yes, you can absolutely audit the MOOC, but the coursework to earn a completion certificate is due end of May and it won't be possible to earn one afterwards.

MOOC Still Available!: Members confirmed that the MOOC will be available to audit after May.
The discussion clarified that while certificate-earning coursework has a May deadline, auditing remains an option beyond that.

LLM Agents (Berkeley MOOC) ▷ #mooc-readings-discussion (1 messages):

DeepSeek-R1, Reinforcement Learning, Verifiable Reward, Chains-of-Thought 

Reasoning Capabilities boosted by DeepSeek-R1: Recent Large Reasoning Models such as DeepSeek-R1 have demonstrated that general reasoning capabilities of LLMs greatly improve when base models undergo post-training with Reinforcement Learning (RL) with a verifiable reward, especially in mathematics and programming.
The linked blogpost mentions that ease of verification is crucial to improving domain-specific capabilities and that the abundance of high-quality datasets is another critical prerequisite for models to learn to construct coherent Chains-of-Thought (CoTs) leading reliably to correct answers.

Verifiable Reward is important: Mathematics and programming have particularly benefited from verifiable rewards, as these domains can be verified quite easily—allowing accurate interpretation of LLM responses and effective comparison to the ground truth on a semantic level.
The idea that ease of verification is crucial to improving domain-specific capabilities has become widely accepted in the research community.

Link mentioned: 🐉 Loong: Synthesize Long CoTs at Scale through Verifiers: Project Loong is a collaborative effort lead by CAMEL-AI to explore Long CoTs data generation through verifiers at scale.

Codeium (Windsurf) ▷ #announcements (1 messages):

Windsurf Wave 6, One-Click App Deploys, Commit Message Generation, Conversation Table of Contents, Windsurf Previews 

Windsurf Wave 6 Surfs In: Windsurf's latest update, Wave 6, has been released, featuring one-click app deploys and a suite of enhancements.
The update also includes enterprise access to MCPs and Turbo Mode, along with one-click commit message generation, conversation table of contents, improved performance in long conversations, enhanced Tab features, and added MCP SSE support.

Windsurf Deploys Catapult Apps Publicly: Windsurf Deploys (beta) allows users to share their apps on the public internet with one click.
This feature is part of the Wave 6 release and aims to simplify the deployment process for developers, as shown in their blogpost.

Cascade Aut Prevents Screenshot Pasting: Windsurf Previews (Beta) lets users preview locally run websites in their IDE or browser.
Users can select React and HTML elements within the preview to send to Cascade as context, including console errors, eliminating the need for copy-pasting or screenshots; can be turned off via Windsurf Settings, and is available to all plans without costing any credits, according to the changelog.

Windsurf Tabs into Jupyter Notebooks: The Wave 6 update brings enhanced Tab features to Windsurf, including user search context and Jupyter Notebook support.
This integration aims to provide a more seamless experience for users working with notebooks, streamlining their workflow within the Windsurf platform, as highlighted in a recent tweet.

Links mentioned:

TikTok - Make Your Day: no description found
Windsurf Wave 6: Introducing Wave 6, our sixth batch of updates to the Windsurf Editor.
Windsurf Editor Changelogs | Windsurf Editor and Codeium extensions: Latest updates and changes for the Windsurf Editor.
Tweet from Windsurf (@windsurf_ai): Wave 6 is here!Included in this update:🚀 App Deploys📝 Conversation Table of Contents💬 Commit Message Generation🟠 Windsurf Tab in Jupyter Notebook⏩ Additional context for Windsurf Tab📂 Improved MC...
Bluesky: no description found
@windsurf_ai on Threads: Wave 6 is here!Included in this update:🚀 App Deploys📝 Conversation Table of Contents💬 Commit Message Generation🟠 Windsurf Tab in Jupyter Notebook⏩ Additional con...

Gorilla LLM (Berkeley Function Calling) ▷ #discussion (1 messages):

Phi-4-mini-instruct, BFCL 

Phi-4-mini-instruct PR needs review: A member created a PR to add tool evaluation for Phi-4-mini-instruct with BFCL and is requesting feedback on GitHub.
Request for Code Review on Phi-4-mini-instruct Integration: A contributor has submitted a pull request to integrate and evaluate Microsoft's Phi-4-mini-instruct model within the BFCL framework, seeking community feedback and code review.

Link mentioned: [BFCL] add support for microsoft/Phi-4-mini-instruct by RobotSail · Pull Request #967 · ShishirPatil/gorilla: This PR introduces support for the newly-released Phi-4-mini-instruct model from Microsoft:Phi-4-mini-instructThe results for this were initially evaluated against f81063; however, the model ha...

Don't miss what's next. Subscribe to AI News (MOVED TO news.smol.ai!):