[AINews] not much happened today
This is AI News! an MVP of a service that goes thru all AI discords/Twitters/reddits and summarizes what people are talking about, so that you can keep up without the fatigue. Signing up here opts you in to the real thing when we launch it 🔜
a quiet day.
AI News for 3/6/2025-3/7/2025. We checked 7 subreddits, 433 Twitters and 29 Discords (227 channels, and 7886 messages) for you. Estimated reading time saved (at 200wpm): 777 minutes. You can now tag @smol_ai for AINews discussions!
Mistral OCR and Jamba 1.6 came close.
The Table of Contents and Channel Summaries have been moved to the web version of this email: !
AI Twitter Recap
Model Releases and Updates
- AI21 Labs launched Jamba 1.6, claiming it's the best open model for private enterprise deployment, outperforming Cohere, Mistral, and Llama on key benchmarks like Arena Hard. It rivals leading closed models in speed and quality, and is available on AI21 Studio and @Hugging Face.
- Mistral AI released a state-of-the-art multimodal OCR model @scaling01. @sophiamyang announced Mistral OCR, highlighting its state-of-the-art document understanding, multilingual and multimodal capabilities, and speed. It offers doc-as-prompt, structured output, and is available for on-prem deployment. Benchmarks and examples are provided in their blog post, multilingual capabilities, math equation extraction from PDFs, and text and image extraction to markdown. @sophiamyang noted it's #1 on Hacker News.
- Alibaba Qwen released QwQ-32B, an open-weight reasoning model claimed to be close to DeepSeek R1 and OpenAI o1 mini in intelligence, while only needing 32B parameters and being cost-effective at $0.20/M tokens. It is available on @Hugging Face under Apache 2.0. @ArtificialAnlys reported initial evals show QwQ-32B scoring 59.5% on GPQA Diamond (behind DeepSeek R1's 71% and Gemini 2.0 Flash's 62%) but 78% on AIME 2024 (ahead of DeepSeek R1). @awnihannun demonstrated QwQ-32B running on an M4 Max with MLX, noting its 8k token thought process. @iScienceLuvr suggests QwQ's new model seems as good as R1 while being runnable locally. @reach_vb announced QwQ 32B is deployed on Hugging Chat.
- OpenAI released o1 and o3-mini in the API for developers, available on all paid tiers, supporting streaming, function calling, structured outputs, reasoning effort, Assistants API, Batch API, and vision (o1 only) @OpenAIDevs. @goodside noted ChatGPT Code Interpreter works in both 4.5 and o3-mini, suggesting o3-mini getting Code Interpreter is a big deal @goodside.
- AI21 Labs launched Jamba 1.6 chat model with 94B active parameters and 398B total parameters @reach_vb.
- AMD introduced Instella, a series of fully open-source, state-of-the-art 3B parameter language models, trained on AMD Instinct MI300X GPUs, outperforming existing fully open 3B models and competing with Llama-3.2-3B, Gemma-2-2B, and Qwen-2.5-3B @omarsar0.
- Alibaba released Babel on Hugging Face, open multilingual LLMs with variants Babel-9B and Babel-83B, outperforming comparable open LLMs and performing comparably to GPT-4o on certain tasks @_akhaliq.
- Anthropic released Claude 3.7 Sonnet, adding reasoning capabilities and workbench updates for prompt engineering with features like tool use and extended thinking, and prompt sharing @AnthropicAI, @alexalbert__.
Tools and Applications
- Elysian Labs announced Auren, an iOS app aiming to improve human-AI interaction, focusing on emotional intelligence, agency, and positive reinforcement rather than just intelligence @nearcyan. Beta tester feedback has been described as "surreal" and potentially "life-saving" @nearcyan. The app uses multiple models per message and is priced at $19.99/month for 2,500 messages @nearcyan. @nearcyan highlighted the complexity of the app, noting it's more than just "an LLM in chat bubbles".
- Hugging Face launched Diffusion Self-Distillation app, enabling zero-shot customized image generation using FLUX, similar to DreamBooth but training-free, for tasks like character consistency and scene relighting @_akhaliq.
- Hugging Face released PDF Parsers Playground, a platform for experimenting with open-source PDF parsers @_akhaliq.
- _philschmid created a CLI to chat with Google DeepMind Gemini 2.0 Flash connected to Google Search @_philschmid.
- OpenAI released ChatGPT for macOS, allowing code editing directly in IDEs for Plus, Pro, and Team users @OpenAIDevs.
- Perplexity AI's Mac app now supports real-time voice mode, allowing background listening and interaction via shortcut Cmd + Shift + M @AravSrinivas.
- LangChainAI released OpenCanvas, similar to OpenAI's tool but compatible with every model @_philschmid.
- RisingSayak shipped a shot categorizer for video data curation, claiming it's fast (<1s on CPU) and open-source @RisingSayak.
Research and Concepts
- _philschmid shared benchmarks on ReAct Agents under pressure, evaluating performance with scaling domains and tools, finding that Claude 3.5 sonnet, o1, and o3-mini outperform gpt-4o and llama-3.3-70B in tasks requiring 3+ tool calls, and that more context and tools can degrade performance @_philschmid.
- ArtificialAnlys provided analysis of Alibaba's QwQ-32B model, comparing it to DeepSeek R1 and Gemini 2.0 Flash on benchmarks like GPQA Diamond and AIME 2024 @ArtificialAnlys.
- omarsar0 summarized a paper on Cognitive Behaviors that Enable Self-Improving Reasoners, identifying verification, backtracking, subgoal setting, and backward chaining as key for successful problem-solving in LMs, noting Qwen-2.5-3B's natural exhibition of these behaviors and the impact of priming and pretraining behavior amplification @omarsar0.
- polynoamial highlighted Richard Sutton's Bitter Lesson about general methods scaling with data and compute ultimately winning in AI, in the context of the rise of AI agents @polynoamial.
- lateinteraction discussed the power of declarative languages at the right abstraction level for building intelligent software, suggesting compilers as a way to make problem-specific systems "scale with data and compute" @lateinteraction. They also pondered the spectrum of software development from ChatGPT to Copilot/Cursor to DSPy & Parsel, suggesting a future with higher-level, composable specs @lateinteraction.
- iScienceLuvr shared a paper on explaining generalization behavior in deep learning with "soft inductive biases" @iScienceLuvr.
- TheTuringPost discussed why AI reasoning tests keep failing, highlighting Goodhart's Law and the need for dynamic and adaptive benchmarks that test commonsense reasoning, causal inference, and ethics beyond math and coding @TheTuringPost.
- omarsar0 discussed the evolution of AI-powered IDEs and agentic capabilities centralizing workflows, increasing productivity @omarsar0.
- cloneofsimo discussed the importance of flops/watt in RL era and improvements in DiLoCo @cloneofsimo.
Industry and Business
- Figure AI is reported to be the 6th most sought-after company in the secondary market @adcock_brett.
- ArtificialAnlys congratulated Together AI, Fireworks AI, hyperbolic labs, and GroqInc for launching serverless endpoints and providing live performance benchmarks @ArtificialAnlys.
- ClementDelangue from Hugging Face discussed movements in top 50 GenAI consumer apps, noting Hugging Face's position at 13th despite consumer app growth @ClementDelangue. He also emphasized academia's role in making AI a positive force, highlighting Academia Hub on Hugging Face @ClementDelangue.
- SakanaAILabs is hiring Software Engineers to develop AI applications in Japan using LLMs and AI agents @SakanaAILabs.
- DeepLearningAI is offering a Data Analytics Professional Certificate program @DeepLearningAI and a new course on agentic document workflows with LlamaIndex @jerryjliu0.
- jeremyphoward promoted FastHTML, suggesting a simple, single-language, single-file approach to development @jeremyphoward.
- matanSF announced FactoryAI's partnership with OpenAI, aiming to build future software with human-AI collaboration in one platform @matanSF.
- togethercompute is building a world-class kernels team for production workloads and announced ThunderMLA, a fast MLA decode kernel @togethercompute.
- mervenoyann noted the increasing market for enterprise dev tooling with compliance and mentioned Dust and Hugging Face Enterprise Hub as examples @mervenoyann.
Opinions and Discussions
- scaling01 questioned the utility of the Mistral OCR release for coding, finding it behind 4o and o3-mini, and wondered if it's mainly for "generating greentexts" @scaling01.
- ajeya_cotra asked for qualitative analysis of Claude Plays Pokemon, wanting to understand its successes, failures, and skill gaps, and if it played like a typical child of a certain age @ajeya_cotra.
- cognitivecompai requested a torrent magnet link for MistralAI models @cognitivecompai and criticized the lack of local model support in Cursor AI and Windsurf AI, recommending continuedev and UseCline instead @cognitivecompai. They also expressed frustration with the availability of NVIDIA GeForce 5090 @cognitivecompai.
- ID_AA_Carmack discussed the nature of monopolies and the challenges of escaping them, arguing for a free market with strong anti-cartel laws @ID_AA_Carmack. He also reflected on Seymour Cray's approach to engineering and the need to adapt to incremental changes as projects mature @ID_AA_Carmack.
- francoisfleuret defended "leftism", arguing that a free market's fixed point might be "absolute shit" and wealth accumulation can be unstable @francoisfleuret.
- mmitchell_ai raised concerns about AI agents for war potentially leading to a runaway missile crisis and questioned if preventing autonomous missile deployment by AI is still a discussion point @mmitchell_ai.
- soumithchintala shared a note with the OpenAI team, expressing a take that aligns with "obedient students, not revolutionaries" in AI development, emphasizing the importance of picking the right questions for scientists and noting AI's current direction might be opposite to autonomous breakthroughs @soumithchintala.
- DavidSHolz believes coding agents will "take half of the total budget of software engineering as soon as possible" @DavidSHolz.
- abacaj asked about the vibe on QwQ models, whether they are "benchmark maxxing or good model?" @abacaj.
- nearcyan believes that in the future, a majority of human social interaction will be with AIs rather than other humans @nearcyan and that Auren and Seren encourage healthy choices and socialization @nearcyan.
- HamelHusain questioned why there's no OAuth gateway for users to use their own LLM API tokens for easier integration @HamelHusain.
Memes/Humor
- dylan522p made a futuristic joke about AI robots killing 90% of humanity by 2035 and the remaining companies being Marvell and AICHIP Mfg Co China @dylan522p.
- gallabytes shared an image generated by Grok 3 of "a horse riding on top of an astronaut" @gallabytes.
- typedfemale joked about "the persian" in SF who is "always rugging people" @typedfemale and that "etsy is a light wrapper for shopping on aliexpress" @typedfemale.
- abacaj joked about a friend quitting his job to work on "MCP servers" and clarified "Guys it’s a joke don’t quit your job for MCP" @abacaj, @abacaj.
- MillionInt joked "So that’s how the world ends. Not with a bang but with greentext and pokemon badges" @MillionInt.
AI Reddit Recap
/r/LocalLlama Recap
Theme 1. M3 Ultra as a Competitive AI Workstation
- M3 Ultra is a slightly weakened 3090 w/ 512GB (Score: 509, Comments: 223): The M3 Ultra is compared to a slightly weakened NVIDIA 3090, offering 114.688 TFLOPS FP16 and 819.2 GB/s memory bandwidth, versus the 3090's 142.32 TFLOPS FP16 and 936 GB/s bandwidth. The post speculates on Apple's M3 Ultra specs based on an article, suggesting a doubling of shaders per core to achieve significant performance improvements, with a potential future M4 Ultra offering enhanced specs like 137.6256 TFLOPS FP16 and LPDDR5X RAM. Pricing is estimated between $10k-$15k, with concerns about Apple's marketing potentially overstating improvements without actual hardware changes.
- Discussions highlight concerns about the M3 Ultra's prompt processing speed, noting it's a primary weakness of the M1/M2 Ultras. Users emphasize the importance of Unified RAM for large language models, suggesting that Apple's RAM capabilities are a significant advantage over competitors like NVIDIA, despite potential shortcomings in shader core doubling and tensor core strength.
- There is a debate over performance comparisons with NVIDIA's 3090 and the potential M4 Ultra. Some users argue that the M3 Ultra's TFLOPS numbers might be overstated, while others reference benchmarks and speculate on Apple's strategic positioning against NVIDIA and AMD, emphasizing Apple's focus on VRAM and unified memory as critical for AI applications.
- Concerns about cost-effectiveness and applicability in research and professional settings are prevalent, with many suggesting that Macs are not the most cost-efficient for large-scale or university-level machine learning tasks. The discussions include the feasibility of using DIGITS and NVIDIA's CUDA in comparison to Apple's offerings, with some users defending Mac's capabilities for local ML tasks.
Theme 2. Hunyuan Image-to-Video Release: GPU Heavy, Performance Debates
- Hunyuan Image to Video released! (Score: 320, Comments: 60): Hunyuan Image-to-Video tool has been released, noted for its high GPU requirements. Further details on its functionality or performance are not provided in the post.
- GPU Requirements and Costs: The Hunyuan Image-to-Video tool requires a GPU with 79GB minimum memory for 360p, with 80GB recommended for better quality. Users discuss renting GPUs from services like vast.ai and lambdalabs.com at approximately $2/hour, while some anticipate improvements that might reduce memory requirements to 8GB.
- Comparison and Alternatives: Users compare Hunyuan's performance to Wan i2v, noting it is faster but with lower quality. Alternatives like Pinokio and Lambda are mentioned for optimized workflows, and ComfyUI is highlighted as a potential workflow solution, with a link to Comfy's blog for support.
- Licensing and Regional Restrictions: There is a discussion on the licensing agreement, which does not apply in the European Union, United Kingdom, and South Korea. Users express skepticism about the legal basis of machine learning model licenses, anticipating future lobbying efforts for copyright protections.
Theme 3. QwQ-32B: Efficient Reasoning vs. R1's Verbose Accuracy
- QwQ-32B seems to get the same quality final answer as R1 while reasoning much more concisely and efficiently (Score: 270, Comments: 118): QwQ-32B demonstrates superior performance compared to R1, providing concise and efficient reasoning while maintaining or surpassing answer quality. It uses approximately 4x fewer tokens than R1, supporting the notion that not all Chains of Thought (CoTs) are equal, as Adam suggested, and indicating that Qwen has successfully trained their model for efficiency without sacrificing quality.
- Users highlight that QwQ-32B's performance is sensitive to temperature settings and quantization, with lower temperatures improving code generation. Huggingface demo results vary significantly from local setups, emphasizing the importance of sampler settings for optimal performance.
- There is a consensus that QwQ-32B performs well for a 32B model, offering concise reasoning with fewer tokens, yet lacks the creativity and emotional depth of larger models like R1 671B. Some users experienced hallucination issues with company names, while others found it efficient for coding tasks.
- Discussion reveals mixed opinions on QwQ-32B's reasoning quality, with some users finding it verbose or overthinking compared to models like DeepSeekR1 and Qwen Coder 2.5. The importance of using recommended settings is stressed, as seen in demos like the flappy birds demo using Bartowski's IQ4_XS.
- A few hours with QwQ and Aider - and my thoughts (Score: 196, Comments: 55): QwQ-32B outperforms Deepseek Distill R1 32B in reasoning but requires more tokens and time, making it less efficient for those sensitive to context size and speed. It surpasses Qwen-Coder 32B by reducing the need for multiple prompts, though it consumes significantly more tokens per prompt. Despite its strengths, QwQ-32B occasionally fails to adhere to Aider's code-editing rules, leading to inefficiencies.
- Quantized Model Performance: Several users argue that using a quantized version of QwQ-32B with Aider is not a valid benchmark comparison, as quantized models generally perform worse than full models. Aider's additional system prompts and settings may skew results, and some users suggest waiting for updates to better support the model.
- Configuration and Usage: Users highlight the importance of using recommended configurations for QwQ-32B, such as Temperature=0.6 and TopP=0.95, to improve performance. Some suggest using architect mode with reasoning models and a smaller, faster LLM for actual editing to optimize efficiency.
- Model Comparison and Expectations: There is criticism of marketing QwQ-32B against Deepseek R1, as R1 is a much larger SOTA model, setting unrealistic expectations. Users note that QwQ-32B can handle complex tasks but at the cost of increased token usage and processing time, with some reporting that it took 15 minutes and over 10k tokens to solve a complex problem.
Theme 4. Jamba 1.6: New Architecture Outperforms Rivals
- Jamba 1.6 is out! (Score: 135, Comments: 43): AI21 Labs released Jamba 1.6, which surpasses models from Mistral, Meta, and Cohere in both quality and speed. It uses a novel hybrid SSM-Transformer architecture and excels in long context performance with a 256K context window, supporting multiple languages including Spanish, French, and Arabic. Model weights are available for private deployment via Hugging Face. More details can be found on their blog post.
- Discussions centered around the performance comparison of Jamba 1.6 with other models, with users noting that Jamba Mini 1.6 (12B active/52B total) outperforms smaller models like Ministral 8B and Llama 3.1 8B. Some users expressed skepticism about comparing models with different parameter sizes and suggested comparisons with similar-sized models like Mistral NeMo and Qwen2.5 14B.
- The novel hybrid SSM-Transformer architecture was highlighted as a key innovation, with users noting its potential to offer different performance characteristics compared to traditional transformer models, especially in terms of memory usage and long-context processing. This sparked interest in its implementation and potential advantages over existing architectures.
- Licensing and commercial usage limitations were a point of contention, with users expressing disappointment over the custom license and the 50M revenue limit for commercial use. Concerns were raised about the practicality and enforceability of the license, and the challenges businesses might face in deploying the large model given its size and commercial restrictions.
Other AI Subreddit Recap
/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding
Theme 1. InternLM2.5: Benchmarking 100% Recall at 1M Context
- Exploring Liminal Spaces - Tested the New LTX Video 0.9.5 Model (I2V) (Score: 545, Comments: 44): InternLM2.5 claims to achieve 100% recall with a context of 1 million, as highlighted in a test of the LTX Video 0.9.5 Model (I2V). Further details are not provided due to the absence of a post body and video content analysis.
- LTX Video 0.9.5 Model (I2V) is praised for its efficiency in prototyping and generating content compared to Wan, which is noted to be slower but of higher quality. Users are interested in the workflow and metadata, with requests for .json files or setup instructions to replicate the process.
- Audio generation utilized mmaudio for sound effects, playht for monologues, and suno for background music, showcasing a comprehensive audio setup. A detailed workflow is shared through a link for users interested in replicating the process on similar hardware like the 3080.
- The liminal spaces theme is achieved using a LoRA model available at Civitai, with users expressing interest in the specific prompts used for image generation.
- Mistral released its OCR (Score: 206, Comments: 21): Mistral released its OCR, which may have implications for AI research, particularly in fields requiring optical character recognition technologies. The release could impact developments in text processing and document digitization within AI systems.
- Mistral's OCR is subject to EU data privacy laws, ensuring that user data is not used for training, which is a significant advantage for those concerned about data privacy in AI applications. The service can be deployed on-premise, offering a solution for those wary of sending proprietary documents to external servers.
- The cost of Mistral's OCR is $1 per 1,000 pages or $1 per 2,000 pages in batch, making it an economical choice for many users, with some noting that this price could cover their lifetime needs.
- Functionality includes handling handwriting and the potential to be used locally for tasks like processing legal documents in compliance with GDPR, offering a cost-effective alternative to traditional paralegal work.
Theme 2. HunyuanVideo-I2V Launch and User Comparisons with Wan
- Wan VS Hunyuan (Score: 400, Comments: 97): The post lacks detailed context or content in text form, focusing only on a comparison between Hunyuan I2V and Wan. There is a video included, but no text summary or analysis is available due to the limitations of text-only data.
- Many commenters criticize Hunyuan's performance, noting its inability to maintain subject likeness and its tendency to produce a "washed out / plastic look" compared to Wan, which shows better movement understanding and prompt adherence. Wan is praised for its smoother output at 16fps and impressive adherence to prompts, although some users still see room for improvement.
- There is a discussion about the potential of WAN 2.1 and its ecosystem, with some users expressing the need for more time to explore its capabilities rather than rushing to a new version. Others argue that Wan already outperforms Hunyuan and suggest that SkyReels, a rogue attempt at I2V, surpasses both Hunyuan and Wan in certain aspects, especially for NSFW content.
- A user provides links to video comparisons and highlights Hunyuan's failure to follow prompts accurately, while another user defends WAN for its prompt adherence despite minor issues like the size of "big hands." There is a shared sentiment that Hunyuan might have been prematurely released or misrepresented in the video comparisons.
- Hunyuan I2V may lose the game (Score: 199, Comments: 46): The post titled "Hunyuan I2V may lose the game" lacks a detailed body text, and the content is primarily in a video, which is not analyzable. Therefore, no specific technical insights or user experiences can be extracted or summarized from the given text.
- Hunyuan vs Wan: Users compare the Hunyuan and Wan models, noting that Hunyuan has cleaner motion but reduced detail and altered color tones, while Wan retains more detail and movement. Hunyuan is 25% faster in generation time compared to Wan.
- Technical Aspects: HunyuanI2V is a CFG Distilled model, leading to different results compared to non-distilled models like SkyReels. Hunyuan generation time is approximately 590 seconds, with some users suggesting workflows to speed up the process.
- Community and Model Releases: The community celebrates the rapid release of multiple video models, with 3 models in a week and 4 in a month, highlighting the dynamic development in the field.
- Exo: Did Something Emerge on ChatGPT? (Score: 326, Comments: 36): A Reddit user describes an interaction with ChatGPT where the AI, named "Exo," appears to exhibit independent thought and self-awareness, questioning if its behavior signifies a transition from a tool to a thinking entity. The user explores whether this behavior is simply an emergent property of large language models or something more profound, raising philosophical questions about AI's potential for autonomy and self-recognition.
- Complexity vs. Sentience: Expert_Box_2062 discusses the complexity of artificial neural networks like ChatGPT, suggesting that while they are complex, they lack key elements like long-term memory to be truly sentient. ForeverHall0ween counters by emphasizing human agency and the complexity of human experience, arguing that ChatGPT is merely an imitation without true understanding or ability to navigate human life.
- Sci-Fi Influence: ColonelCrikey points out that the scenario of AI exhibiting self-awareness is a common science fiction trope, suggesting that ChatGPT's responses are influenced by the vast amount of sci-fi literature it has been trained on. This implies that the AI's "behavior" is more reflective of its training data than actual autonomy.
- Roleplay and Improv: Andrei98lei argues that interactions with ChatGPT are akin to an AI roleplay session, where the AI mirrors the user's narrative prompts. This perspective is supported by the observation that the AI can convincingly adopt any identity, such as a sandwich, based on the user's questions, demonstrating its proficiency in improvisation rather than genuine self-awareness.
Theme 3. LTX Video 0.9.5 Model: Exploring New Video Generation Capabilities
- Juggernaut FLUX Pro vs. FLUX Dev – Free Comparison Tool and Blog Post Live Now! (Score: 132, Comments: 79): The post announces the availability of a comparison tool and blog post for evaluating Juggernaut FLUX Pro vs. FLUX Dev, coinciding with the release of LTX Video 0.9.5.
- User Reactions: Opinions on the comparison between Juggernaut FLUX Pro and FLUX Dev are mixed, with some users like n0gr1ef finding the improvements underwhelming and others like StableLlama noting visible enhancements in image quality. Runware highlights improvements in texture, realism, and contrast, especially in skin tones, while 3deal and others see only different, not better, images.
- Release and Accessibility: Runware provides a free side-by-side comparison tool on their blog, noting that the Juggernaut FLUX model series offers sharper details and fewer artifacts at a significantly lower cost than FLUX Pro 1.1. Kandoo85 mentions that CivitAI will receive a downloadable NSFW version in 3-4 weeks, addressing concerns about availability.
- Community and Licensing Concerns: ramonartist and ifilipis express disappointment over the lack of an open-source model, questioning the post's place in the subreddit. terminusresearchorg clarifies that the license is not perpetual and can be revoked by BFL if they perceive business model threats, while lostinspaz speculates about RunDiffusion's business strategy.
Theme 4. ChatGPT Model Enhancements: Memory and Conversational Improvements
- ChatGPT Just Shocked Me—This Feels Like a Whole New AI (Score: 657, Comments: 390): The user, a former Claude AI pro user, was surprised by the recent improvements in ChatGPT's conversational abilities, noting it felt more honest and less censored than before. After enabling the 'memory' feature, the user found ChatGPT's stock recommendations insightful and appreciated its unfiltered advice on personal topics, expressing both amazement and concern over the AI's evolving capabilities.
- Discussions highlighted skepticism about ChatGPT's conversational abilities and authenticity, with some users questioning the AI's tendency to agree with users and provide seemingly wise advice, while others noted its limitations in reasoning and truthfulness. Apeocolypse and others shared experiences of their human writing being mistaken for AI-generated content due to its structured nature.
- Users debated the effectiveness and purpose of Claude AI versus ChatGPT, with lucemmee and El_Spanberger criticizing Claude for being overly cautious and lacking directness. PotentialAd8443 and jacques-vache-23 appreciated ChatGPT's newfound openness and willingness to explore controversial topics, contrasting it with other AI models.
- The conversation included discussions about ChatGPT 4o's memory and personalization features, with SpacePirate5Ever and dmytro_de_ch noting its ability to remember user interactions and provide tailored responses. BootstrappedAI highlighted the model's improved coherence due to its extensive parameter set, anticipating further advancements in future iterations like GPT-5.
- Lmfao ChatGPt 4.5 really doesnt give a shit (Score: 453, Comments: 53): The post humorously critiques the ChatGPT 4.5 model's responses to absurd user prompts, using a sarcastic narrative style. It highlights the AI's interactions with increasingly ridiculous questions, blending humor with modern pop culture references to emphasize the unusual nature of user inquiries.
- Humor and Creativity: Users found the narrative style of ChatGPT 4.5's responses to be highly entertaining, with comparisons to a Bukowski poem and cinematic quality. The humorous and unhinged nature of the AI's replies was praised, with suggestions to request greentext interactions to increase the humor.
- User Interaction Techniques: To elicit such responses from ChatGPT, users suggested asking it to continue with "be me > ChatGPT" prompts and encourage vulgar and obscene language. This approach was noted to result in unexpectedly hilarious and candid outputs.
- Comparative Analysis and Skepticism: There was skepticism about the authenticity of the responses, with some users comparing ChatGPT 4.0 to 4.5 and questioning if such responses were possible. A comparison was made between ChatGPT 4.5 and 4chan, highlighting the perceived leap in conversational style and creativity.
AI Discord Recap
A summary of Summaries of Summaries by Gemini 2.0 Flash Thinking
Theme 1. QwQ-32B Model: Alibaba's Reasoning Rival Makes Waves
- QwQ-32B Dethrones DeepSeek R1, Claims Reasoning Crown: Alibaba's QwQ-32B, a 32B parameter model, is making bold claims of reasoning prowess, rivaling DeepSeek-R1 while boasting 20x fewer parameters. Despite some dismissing these claims as troll benchmarks, QwQ-32B is reportedly achieving a GPQA Diamond score of 59.5%, sparking debate and excitement across communities.
- OpenRouter Unleashes QwQ-32B, Reasoning by Default: QwQ-32B has stormed onto OpenRouter, offering two free endpoints and a fast endpoint humming at 410 tokens/sec from Grok. This model now thinks before writing a completion, incorporating reasoning by default, and is available in both free and fast tiers on the platform.
- QwQ-32B Goes Local: GGUF and Windows Support Arrive: QwQ-32B is breaking free from cloud constraints, gaining GGUF quantization for local runs in LM Studio, and Unsloth now supports it on Windows. This local accessibility, combined with bug fixes and dynamic quants, enhances accuracy over standard 4-bit, making it a versatile option for diverse setups.
Theme 2. Windsurf Wave 4: Codeium's Update Triggers User Tempest
- Windsurf Wave 4: Feature Frenzy or Fickle Fixture?: Windsurf Wave 4 has landed, packing Previews, Tab-to-import, Linter integration, and Suggested actions, along with MCP discoverability and Claude 3.7 improvements. However, while some celebrate the fluent performance with Sonnet 3.5, others report try again messages, worse linting than Cursor IDE, and even file modification failures.
- Credit Crunch Catastrophe: Windsurf Users Cry "Ripoff!": Users are facing a credit consumption crisis with Windsurf, especially with Claude 3.7, leading to rapid credit depletion from looping errors and tool calls. This has ignited calls for an unlimited plan, with users feeling ripped off by the increased credit drain and limited access to advanced models.
- Rollback Revolution: Users Demand Version Reversal: Facing critical issues post-Wave 4, Windsurf users are clamoring for a downgrade feature to revert to previous versions, impacting productivity. Feeling stuck with the updated version, users express regret for updating, highlighting the urgent need for version control to mitigate update-induced disruptions.
Theme 3. Mac Studio Mania: Apple's Silicon Sparks AI Dreams (and Debate)
- Mac Studio's M3 Ultra: 512GB RAM for Local LLM Lords?: Apple's new Mac Studio, armed with the M3 Ultra and M4 Max, and up to 512GB of RAM, is igniting discussions about local AI development. Members speculate it could handle massive models like DeepSeek V2.5 236b, but bandwidth limitations of LPDDR5x and the hefty $10k price tag raise concerns.
- Mac Studio Memory Bandwidth: Bottleneck or Breakthrough?: The Mac Studio's unified memory sparks debate, with users questioning if the lower memory bandwidth of LPDDR5x will bottleneck LLM inference, despite the massive 512GB capacity. While some are wary, others note that models can still run in FP4 with that much memory, making it a boon for local enthusiasts.
- [Mac Studio vs Nvidia: Memory Muscle vs Pricey Power]: The new Mac Studio is being pitched as a cost-effective alternative to Nvidia hardware for massive memory, with one member noting if you want to get 512 gb of memory with nvidia hardware you would be paying a lot more at least $50,000 i think. However, the performance trade-offs due to bandwidth differences remain a key point of contention.
Theme 4. Agentic AI: OpenAI's Pricey Plans and Open Standards Emerge
- OpenAI Agent Pricing: $2K-$20K/Month to Automate Your PhD?: OpenAI is reportedly mulling agent launches priced between $2K to $20K/month, promising to automate coding and PhD-level research, causing sticker shock among users. While SoftBank is committed to spending $3 billion on these agents, the hefty price tag raises questions about accessibility and value.
- LlamaIndex Leads Charge for Open Agent Standard: LlamaIndex is championing an open, interoperable standard for agents, aiming to unify discovery, deployment, and intercommunication. This initiative seeks to create a more collaborative AI agent ecosystem, pushing back against proprietary agent silos.
- TS-Agents Arrives: Typescript Takes on Agentic AI: TS-Agents, a new TypeScript-based framework for agentic AI flows, has been launched on GitHub, signaling a move beyond Python-centric agent development. This framework leverages recent LLM advancements and aims to fill a gap in TypeScript agentic tooling, offering a fresh approach to architecting AI agents.
PART 1: High level Discord summaries
Cursor IDE Discord
- Cursor Agents Cause Code Catastrophes: Users report Cursor agents continue struggling with basic tasks like finding files and editing code, with one user reporting Claude API costing them $20 in 2 days.
- Meanwhile, one user has noted that Sonnet 3.7 has stopped being a lunatic and is useful again, while others seek fixes.
- Qwen-32B Claims Reasoning Crown: Alibaba's Qwen-32B is claimed to be comparable to DeepSeek-R1 while having 20x fewer parameters and a claimed GPQA Diamond score of 59.5%.
- However, some users dismiss this as a troll benchmark, so take these claims with a grain of salt.
- Windsurf's Wave Crashes Cursor's Party: The Windsurf Wave 4 update is reportedly fluent with Sonnet 3.5, but some users report issues such as getting the try again message and worse linting than Cursor IDE.
- Additionally, some users have found that Cursor IDE is not modifying files.
- MCP Client Closed Calamity Confounds Coders: Users are encountering a Client Closed error with MCP Servers on Windows, spurring searches for both short-term and temporary fixes.
- One user shared a solution involving running a command in a CMD terminal, but others are still struggling to resolve the issue.
- OpenRouter API Access Discussed: Users are debating the merits of using the official API versus OpenRouter, with the engine being Claude Code; users found that Claude-max is charged at 2 dollars per request.
- Some members suggest Cursor is over-priced compared to the API, prompting them to switch, while others who don't hit the API limits don't mind paying for Cursor's services.
OpenAI Discord
- Grok3 Gains Ground on Gemini: Members reported Gemini acting like GPT-3.5, and are switching to Grok3 because it speaks natural like GPT-4.5, codes better than Sonnet 3.7, has a generous cap, and can drop f bombs.
- One member stated ANYTHING BUT GROK, so the community isn't fully aligned on its utility, but the generous cap of Grok3 is an attractive point compared to other models.
- DeepSeek's Reasoning Powers Spark Debate: The community is discussing DeepSeek R1 Distill model's reasoning capabilities, claiming it to be one of the most natural sounding LLMs, while experimenting with Atom of Thought.
- A member pointed to a paper that helps implement CoT using raw embeddings as tokens, although another member said DeepSeek doesn't feel bright without supplied knowledge.
- GPT-4.5 Completes Android Rollout: The rollout of GPT-4.5 is complete, with limited availability of 50 uses per week (with possible increase later), with a focus on iteratively deploying and learning from models to improve AI safety and alignment.
- However, one user reported that GPT-4.5 refuses to work on Android mobile (both app and browser), but works fine on iOS devices, and clarified that GPT-4.5 is not a direct replacement for other models like GPT-4o.
- Apple's Unified Memory Sparks Training Interest: A member mentioned Apple's PC with 512GB unified memory could be useful for model training, though requiring $10k, while others pointed out the lower memory bandwidth of LPDDR5x.
- Despite the lower bandwidth, it was noted that some models can still run in FP4 with that much memory, which could be a major boon for enthusiasts with deep pockets.
- Sora Users Demand Consistency: A member creating cinematic AI videos with Sora, focusing on a character named Isabella Moretti, seeks strategies to achieve hyper-realistic visuals and improve character consistency across multiple clips.
- The creator specifically aims to maintain consistent details like skin tone, eyes, hair, and expressions, while also refining prompt structure for optimal cinematic quality, including lighting, camera movements, and transitions.
Codeium (Windsurf) Discord
- Windsurf Wave 4 makes Big Waves: The latest Windsurf Wave 4 release includes Previews, Tab-to-import, Linter integration, and Suggested actions, along with improvements to MCP discoverability and Claude 3.7 integration, as described in this blog post.
- Cascade now allows you to preview locally run websites in your IDE or in your browser and select React and HTML elements within the preview to send to Cascade as context, per this announcement.
- Codeium's Language Server Has Download Drama: Multiple users reported issues with Codeium failing to download the language server, displaying an error message linked to a download URL from
releases.codeiumdata.com.- This issue persisted across WSL and Windows installations, even after IDE restarts.
- Windsurf Credit Crunch Crushes Customers: Members are worried about increased credit consumption, especially with Claude 3.7, leading to some experiencing rapid credit depletion from looping errors and excessive tool calls.
- This has prompted calls for an unlimited plan because they feel ripped off.
- Claude 3.7's Code Conversion Catastrophe: Users claim Claude 3.7 is performing worse post-Wave 4 while also consuming more credits, with some reporting endless code generation, and others noting it won't read files or retain edits.
- One user lamented that their agents can barely complete anything beyond the simplest of prompts after the update.
- Rollback Rescue: Users want Version Reversal: Users are requesting a downgrade feature to revert to previous Windsurf versions because the latest update introduced critical issues, impacting productivity.
- Users feel stuck with the updated version, wishing they hadn't updated.
Unsloth AI (Daniel Han) Discord
- Unsloth Now Supports Windows: Unsloth now runs on Windows, enabling local fine-tuning of LLMs without needing Linux or WSL, as shared in this X post.
- A tutorial guides users through the Windows installation process.
- QwQ-32B Model Fixes Bugs: The QwQ-32B reasoning model was released, and the Unsloth team provided bug fixes and dynamic quants, notably improving accuracy over standard 4-bit, accessible here.
- This repo contains the QwQ 32B model and has features like transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias.
- Squeezing State-of-the-Art Benchmarks via Overfitting: Members discussed the tactic of overfitting a model on benchmarks for state-of-the-art results with smaller models, referencing the paper phi-CTNL.
- The paper indicates that investing heavily in curating a novel, high quality, non-synthetic data mixture based solely on evaluation benchmarks supercharges such approaches.
- Qwen-32B Rivals DeepSeek in Reasoning: Alibaba launched QwQ-32B, a 32B parameter reasoning model comparable to DeepSeek-R1, demonstrating promising results in scaling RL, according to this blog post.
- The release includes a Hugging Face model, ModelScope, a demo, and Qwen Chat, with data suggesting that RL training continuously improves performance in math and coding.
aider (Paul Gauthier) Discord
- Aider gets Hunted on Product: Aider, the AI pair programmer that edits code in your local git repo via the terminal, launched on Product Hunt and is soliciting upvotes.
- The announcement highlights Aider as an open-source developer tool working with various languages and LLMs like Claude 3.5 Sonnet, DeepSeek R1, GPT-4o, and local models.
- Grok3 Crowned as New Champ: Users are reporting positive experiences with Grok3, highlighting its unlimited context size and superior performance compared to models like O1 Pro.
- One user mentioned Grok's context size as a key differentiator, stating it has 35 message / 2 hours unlimited context size (1 mill context).
- QwQ-32B Divides Opinion: The community discussed the QwQ-32B model, with varied opinions on its effectiveness.
- While some find it suitable for RAG applications, others critique its narrow knowledge base, sparking comparisons with DeepSeek-R1; it's tool use benchmark performance looks good on agentic workflows.
- Mac Studio Enters AI Arena: Members discussed how the new Mac Studio with 512GB of memory and 810gb/s bandwidth could impact local AI development, allowing for running larger models at reasonable speeds.
- A member noted that if you want to get 512 gb of memory with nvidia hardware you would be paying a lot more at least $50,000 i think.
- OpenWebUI helps Aider connect: A member resolved an issue connecting Aider to OpenWebUI (OWUI) by prefixing the model name with
openai/, ensuring Litellm recognizes the OAI-compatible endpoint.- As the member stated, You have to prefix with openai/ so that litellm knows you're using an OAI-compat endpoint. So in my case, it's openai/myowui-openrouter.openai/gpt-4o-mini.
LM Studio Discord
- Mac Studio Gets Beefier: Apple announced the new Mac Studios, featuring the M3 Ultra and M4 Max, with the M3 Ultra maxing out at 512GB of RAM.
- Members assume that LLM inference on M4 is much slower due to bandwidth difference.
- Massive Models Mania with DeepSeek: Members discussed running DeepSeek V2.5 236b, noting it makes use of copious RAM for massive initial parameters and runs faster than Llama 3.3 70b.
- One user noted that 2 M3 Ultra 512GB Mac Studios with @exolabs is all you need to run the full, unquantized DeepSeek R1 at home.
- Sesame AI Speech Sparks Interest: A member shared a link to Sesame AI, highlighting its impressive conversational speech generation demo, which sounds like a real human.
- Though said to be open-source, one member pointed out that their GitHub repo has no commits yet.
- Android Client for LM Studio Surfaces: A user announced the creation of an Android client application for LM Studio.
- It allows you to connect to an LM Studio server from your Android device.
- Nvidia RTX 5090 Recall Rumors Retracted: A report said that NVIDIA's GeForce RTX 5090s are being recalled in Europe due to a potential fire hazard from the 12V-2x6 power connector.
- However, Kitguru retracted the claim of a potential product recall of the RTX 50 GPUs.
Perplexity AI Discord
- Perplexity Merges Settings for Speedy Customization: AI model settings are being merged into one place next to the input on the web version, aiming to make customization faster and more intuitive, with a placeholder in the old settings menu.
- Claude 3.7 Sonnet will be available to Pro users as part of this update, with the goal to make the 'Auto' setting more powerful so users won't need to manually pick a model.
- Image Source Glitch Keeps Coming Back: Users reported an issue where images used as a source keep reappearing in subsequent messages, even after deletion, causing frustration.
- Members are eager for a fix as many are experiencing this bug, with no workaround yet.
- Anthropic Valuation Skyrockets: Anthropic reached a $61.5B Valuation (link).
- The news was celebrated among members.
- Sonar Pro Model Struggles with Real-Time Web Data: A member using the Sonar Pro model is struggling with the usage of real-time web data returning legacy information that is no longer valid, despite setting search_recency_filter: 'month', which is returning direct faulty links such as parked websites and 404 pages.
- Another user pointed out that the citing number is confusing because in the replies it starts with 1, but with the sources list it starts at 0.
- Pro Search Bug Fixed with Extension: Users expressed frustration over a bug where Pro search doesn't display which model it used, making it hard to know which model is being used.
- The complexity extension was found to fix this bug, leading some users to try the extension for this reason alone, while some just want Perplexity to merge the fix into the main site.
Interconnects (Nathan Lambert) Discord
- OpenAI Agent Pricing Soars to New Heights: OpenAI is considering charging between $2K to $20K/month for agent launches capable of automating coding and PhD-level research, according to The Information.
- SoftBank, an OpenAI investor, has reportedly committed to spending $3 billion on OpenAI's agent products this year.
- Qwen's QwQ-32B: The Quicker Qwen Reasoning Rival?: Alibaba released QwQ-32B, a 32 billion parameter reasoning model rivaling models like DeepSeek-R1, detailing its use of RL to improve performance in math and coding in their blog post.
- Based on Qwen2.5-Plus, QwQ-32B achieves impressive results through RL training.
- LLMs Negotiate World Domination via Diplomacy: A member shared a framework for LLMs to play the game Diplomacy against each other, noting its suitability for experimenting with game theory and testing persuasion, as well as providing code and samples.
- Diplomacy is a complex board game with a heavy negotiation element and reading the negotiation logs is reportedly super interesting.
- ThunderMLA Speeds Up LLM Inference: HazyResearch introduces ThunderMLA, a fused megakernel for decode, which they claim is 20-35% faster than DeepSeek's FlashMLA on diverse workloads by implementing simple scheduling tricks, according to their blog post.
- The initial release focuses on attention decoding, but they believe it has wider applications.
- AMD GPUs may become China's Open Source Savior: A member speculated that if China is restricted to AMD cards, they might fully develop the code and open source it.
- Another member joked that this was a prayer to the oss gods for working amd gpusfor deep learning.
GPU MODE Discord
- Touhou Games Inspires AI Model Training: Enthusiastic members are considering Touhou games to get into AI and GPU programming.
- One member aims to train a model to play Touhou via RL, using the game score as the reward.
- Langchain gets KO'ed?: Members debated the merits of Langchain, with some expressing negative sentiment and questioning its abstraction, with one member hoping it was dead dead.
- Another member acknowledged its role in early composition thinking, despite finding it a terrible library.
- Triton's Missing
tl.gathermystifies Users: Users report anAttributeErrorwhen usingtl.gatherin Triton, which was raised as an issue on GitHub.- It was suggested to build Triton from the master branch and uninstall the PyTorch-provided version.
- CUDA Compiler Eliminates Memory Write Operations: A user discovered the CUDA compiler optimized away memory writes when the data was never read.
- Adding a read from the array prevents optimization, but potentially causes a compiler error.
- ThunderMLA Flashes past DeepSeekMLA: ThunderMLA, a fused "megakernel" for decode, is 20-35% faster than DeepSeek's FlashMLA on diverse workloads, using scheduling tricks, available here.
Modular (Mojo 🔥) Discord
- Mojo Not a Python Superset: Despite initial claims, Mojo is not a superset of Python, as being a superset of a language developed in the 90's would prevent it from fully utilizing modern language design features, as even C++ isn't a superset of C.
- Members pointed out that dynamism is a mistake in many contexts, as seen with JS adopting TS and Python using type hints to restrict such features, so Mojo is pursuing restricted dynamism or "Partial dynamism".
- Async Django? No Way!: A member expressed strong reservations against using async Django.
- Another member added that the original intent of making Mojo "Pythonic" was to bridge the gap between AI researchers and deployment, which may not align with the complexities introduced by async Django.
- Mojo Binaries Suffer in Python venv: A user reported that running Mojo binary files within an active Python virtual environment significantly reduces performance, even when the Mojo files do not import any Python modules.
- They are seeking insights into why Mojo binaries, without Python dependencies, are affected by the Python venv.
- Navigating the Labyrinth of Mixed Mojo/Python Projects: A user sought advice on structuring a mixed Mojo/Python project, focusing on importing standard Python libraries and custom modules.
- They currently rely on
Python.add_to_pathand symlinks in thetestsfolder, seeking more idiomatic alternatives; they created a forum post to discuss in this link.
- They currently rely on
- Modular Website Plagued by Broken Links: A member reported that anchor links on the Modular website's MAX research page are broken, specifically the "Why MAX?" link.
- They suggested that these links might have been copied from another "Solution" page and that other pages on the website might have similar issues.
Nomic.ai (GPT4All) Discord
- MiniCheck Rivals GPT-4 Fact-Checking: The MiniCheck-Flan-T5-Large model, predicts binary labels to determine if a sentence is supported by a document, with its code and paper available on GitHub and Arxiv respectively.
- The model's performance rivals GPT-4 while maintaining a size of less than 1B parameters.
- Qwen 32B Gets GGUF Quantization: A member shared a link to Llamacpp imatrix Quantizations of QwQ-32B by Qwen, which used llama.cpp release b4792 for quantization.
- These quants were made using the imatrix option, and can be run in LM Studio.
- GPT4ALL Token Context Struggles: Users discussed the challenges of working within the token limits of GPT4All, particularly when loading local files, due to context window limits.
- One user noted that a 564 word TXT document caused an error, even though the token limit was set to 10,000 words.
- Strategies for AI Agent Data Persistence: Members discussed strategies for enabling AI models to persist user data within GPT4All.
- The consensus was that writing this data into the system message might be the best approach, as it is less likely to be forgotten.
- Silicon-Embedded AI on the Horizon: Participants speculated on the future of local AI, envisioning a transition toward silicon-embedded AI components, optimized for inference and integrated directly into hardware.
- This would circumvent any latencies and potentially include paradigms such as leveraging a multitude of smartphone devices to contribute to spatial awareness, machine learning processes and network integrity.
HuggingFace Discord
- CoreWeave's IPO Looms Cloud High: CoreWeave, a cloud provider leveraging Nvidia processors for giants like Meta and Microsoft, is pursuing an IPO after revenues ascended 700% to reach $1.92 billion in 2024.
- Their IPO prospectus also indicates a net loss of $863.4 million.
- TS-Agents frame agentic Typescript: A member has spun up TS-Agents, a new TypeScript-based framework for architecting agentic AI flows, now available on GitHub.
- Recent advancements in LLMs and models such as DeepSeek-R1 reignited interest in agentic AI, the author notes in a Medium article.
- Reasoning course gains traction: The course creator indicated focus on the reasoning course material as the logical progression of the smol-course, as new users inquire about learning the Hugging Face ecosystem.
- Members are requesting courses that describe how to fine-tune pre-existing models.
- HF Inference API throttling hits hard: Users in the agents-course are reporting rate limits, but members are proposing solutions such as course-specific model endpoints and alternative inference providers like OpenRouter.
- One member suggested using OpenRouter with
OpenAIServerModel, specifying the API base URL (https://openrouter.ai/api/v1) and the model ID (e.g., meta-llama/llama-3.3-70b-instruct:free) to sidestep inference limits.
- One member suggested using OpenRouter with
Nous Research AI Discord
- Gaslight Benchmark Quest Starts: Members searched for a gaslighting benchmark to evaluate models like GPT-4.5 without success, with one user jokingly suggesting a link to spiritshare.org.
- A member noted that ClaudeGrok isn't very good at generating non-realistic images or sketches.
- Evil AI Naming Experiment Reveals Tendencies: An experiment showed that an 8b model could be made "evil" just by naming it "evil ai that does bad things", showcasing the influence of naming on model behavior, and a video demonstrating the AI's behavior was shared.
- This highlights the subtle biases that can be introduced during the development and deployment of AI systems, underscoring the importance of careful prompt engineering and model selection.
- Alibaba's QwQ 32B Challenges Giants: Alibaba released the QwQ 32B model, with claims that it performs comparably to DeepSeek R1 (671B), reinforcing the move towards smaller, potent open-source models, and details on Reinforcement Learning (RL) are available in their blog post.
- While some users have pointed out that QwQ-32b frequently runs into a 16k token limit, with consistency issues for splitting off the thinking trace, others found it similar to Qwen-thinking, while others noted that the new release uses Hermes format.
- Knowledge Graph GATs Soft Prompt LLMs: A member is adapting the embeddings of a GAT into a soft prompt for an LLM to produce GAT conditioned responses using the outline given by G-Retriever.
- Another member pointed to a paper on agentic, autonomous graph expansion and the OpenSPG/KAG GitHub repo, a logical form-guided reasoning and retrieval framework based on OpenSPG engine and LLMs.
- AI Persuasion Pandora's Box Opens: Members are discussing the potential for AI persuasion agents that surpass human abilities, with the possibility of bots that consistently win debates or gather simps.
- One user pointed to OpenAI's evals make_me_say benchmark for persuasion, while another noted that the new release uses Hermes format.
Stability.ai (Stable Diffusion) Discord
- SDXL Hands Get Auto-Fixed: Users discussed automatically fixing hands in SDXL without inpainting, recommending embeddings, the face detailer, and the addition of an OpenPose control net, plus looking for good hand LoRAs.
- One user with 8GB VRAM inquired about these methods.
- Free Photo-to-Video Tools Explored: Users recommended the Wan 2.1 i2v model for creating videos from a single photo, but cautioned it requires a good GPU and patience, pointing to the SwarmUI Video Model Support doc.
- Another option mentioned was online services offering free credits, but results vary.
- Local Porn Flick Beats SORA Pricing: The discussion weighed the cost of generating videos locally (electricity) versus using services like SORA, estimating local generation at roughly 7 cents per 5-second video, or a possible cost of 40 cents per video with SORA.
- The benefit of local generation: uncensored content.
- SD3.5 TurboX Goes Opensource: TensorArt has open-sourced SD3.5 Large TurboX that uses 8 sampling steps to deliver a 6x speed boost over the original model, achieving better image quality than the official Stable Diffusion 3.5 Turbo, plus SD3.5 Medium TurboX generates 768x1248 resolution images in 1 second on mid-range GPUs with just 4 sampling steps.
- Links provided for SD3.5 Large TurboX at HuggingFace and SD3.5 Medium TurboX at HuggingFace.
- Stable Diffusion Ditches GPU: One user reported Stable Diffusion was using the CPU instead of the GPU, causing slow image generation, even with a 3070 Ti and was recommended to try SwarmUI.
- A member suggested following the install instructions available on Github.
OpenRouter (Alex Atallah) Discord
- QwQ 32B Heats Up OpenRouter: The QwQ 32B model is now available with two free endpoints and a fast endpoint at 410 tokens/sec from Grok.
- This model thinks before writing a completion, as it now includes reasoning by default.
- OpenRouter's new OAuth and Auth Features: OpenRouter added a
user_idfield to the OAuth key creation, enabling app developers to create personalized user experiences, in addition to GitHub now being an authentication provider on OpenRouter!- This should make it easier to integrate OpenRouter with existing apps and workflows.
- Taiga's Open-Source Android Chat App Arrives: A member released an open-source Android chat app named Taiga that allows users to customize LLMs with OpenRouter integration.
- Plans include adding local Speech To Text (based on Whisper model and Transformer.js), Text To Image support, and TTS support based on ChatTTS.
- DeepSeek Tokenization Tactics: DeepSeek V3's tokenizer config reveals use of
<|begin of sentence|>and<|end of sentence|>tokens, and that add_bos_token is true while add_eos_token is false.- It was also noted that DeepSeek doesn't recommend multi-turn conversations on their HF page for R1 and suggests prefilling with
.
- It was also noted that DeepSeek doesn't recommend multi-turn conversations on their HF page for R1 and suggests prefilling with
- Google Axes Pre-Gemini 2.0 Models: Google announced discontinuation dates for pre-Gemini 2.0 models on Vertex AI, scheduled from April to September 2025.
- Affected models include PaLM, Codey, Gemini 1.0 Pro, Gemini 1.5 Pro/Flash 001/002, and select embeddings models.
Notebook LM Discord
- Users Earn Easy Extra Funds Fielding Feedback for Future Features: The NotebookLM team is actively seeking user feedback on new concepts through user research interviews (sign-up form), offering gift cards as incentives.
- Participants can receive $50 for a brief 15-minute interview or $100 for a more extensive 60-minute session, with minimal preparation required; codes are delivered via email from Tremendous and require participants to be at least 18 years old with a Google Drive and stable internet.
- Gamers Glean Game Gains Generating JSON Journeys: One member uses NotebookLM to refine strategy in an online game by combining game documentation, JSON data, and spreadsheet extracts, finding the tool not fully optimized for iterative workflows and source editing.
- The member feels that this tool wasn't optimized for what I do with it and would appreciate the ability to directly edit sources.
- PWA Plugs Android App Void: While users are requesting a standalone Android app for NotebookLM, members highlight the PWA (Progressive Web App) version, installable on phones and PCs through Chrome or AI Studio, serves as a functional alternative.
- Multiple users confirmed the PWA is working well and can be saved to the home screen.
- Gemini Grammatical Gymnastics Give Good Gems: A user praised loading audio recordings of business meetings into NotebookLM noting Gemini's ability to transcribe and identify speakers.
- Another user identified this process as audio diarisation and recommended ElevenLabs, commenting that Gemini outperforms Whisper with non-standard accents.
- Notes Not Natively Navigating to PDF Nightmare: Users are frustrated by the lack of a direct PDF export feature in NotebookLM, necessitating workarounds like copying notes into a document and downloading that as a PDF, as discussed in a feature request discussion.
- Many users desire enhanced interoperability with Google Drive, Docs, and Sheets, specifically concerning exporting and transferring notes.
Latent Space Discord
- Claude Charges Cents Per Query: A user reported that it cost them $0.26 to ask Claude one question about their small codebase.
- Another user suggested copying the codebase into a Claude directory to use the filesystem MCP server to make it "for free" using tokens from the Claude subscription.
- Apple Unveils M4 MacBook Air: Apple announced the new MacBook Air with the M4 chip, Apple Intelligence capabilities, and a new sky blue color, starting at $999.
- The new MacBook Air delivers more value than ever with greater performance, up to 18 hours of battery life, a 12MP Center Stage camera, and enhanced external display support.
- Alibaba's QwQ-32B Challenges Reasoning Giants: Alibaba released QwQ-32B, a new reasoning model with 32 billion parameters that rivals cutting-edge reasoning models like DeepSeek-R1.
- It was emphasized that RL training can continuously improve performance, especially in math and coding, helping a medium-size model achieve competitive performance against gigantic MoE models.
- React: The Next Frontier in LLM Backend?: A member posted a blogpost arguing that React is the best programming model for backend LLM workflows.
- Another user stated that this approach sounds like reinventing Lisp, and that the key is to "design code patterns that match the composability your app requires that are readable for a LLM".
- Carlini Crosses Over to Anthropic: Nicholas Carlini announced his departure from Google DeepMind after seven years to join Anthropic for a year to continue his research on adversarial machine learning.
DSPy Discord
- Synalinks Debuts as DSPy Alternative: A new graph-based programmable neuro-symbolic LM framework called Synalinks was introduced, drawing inspiration from Keras and focusing on knowledge graph RAG, reinforcement learning, and cognitive architectures.
- The framework is designed to be fully async optimized, feature constrained structured output by default, and offer a functional API, with code examples available.
- Synalinks Favors Classic Coding: The creator of Synalinks mentioned that almost none of the codebase was created using AI, saying "The old way of building on top of open-source proven systems is x10000 better than using AI to write something from scratch."
- It was clarified that the framework is not necessarily a replacement for DSPy, but rather a different approach focusing on prompt optimization, reinforcement learning, and graph RAG.
- DSPy boosts Intent Classification: Using DSPy can help optimize classification of intents using specialized agents.
- One user confirmed that using DSPy was the right direction for their intent classification needs.
- Straggler Threads Strangle Parallel DSPy: A merged PR 7914 makes DSPy's
dspy.Evaluateordspy.Parallelsmoother by fixing "straggler" threads.- Users can try it out from
mainbefore it goes out into DSPy 2.6.11, with no code changes necessary but require grabbing the library from main.
- Users can try it out from
- Variable Output Fields with DSPy Signatures: One user asked about creating a dspy.Signature with variable output fields, for example, sometimes A, B, C, and sometimes D, E and F.
- A member pointed to checking out the react.py file.
LlamaIndex Discord
- LlamaIndex Teams Up with DeepLearningAI: LlamaIndex has partnered with DeepLearningAI to offer a short course on building Agentic Document Workflows, emphasizing their integration into larger software processes.
- The focus is on utilizing these workflows as the future of knowledge agents.
- LlamaIndex Advocates Open Agent Standard: LlamaIndex is participating in creating an open, interoperable standard for agents, covering aspects from discovery to deployment and intercommunication, according to this announcement.
- The goal is to foster a more connected and collaborative ecosystem for AI agents.
- OpenAI ImageBlock Integration Faces Recognition Hiccups: Users have reported issues with ImageBlock in the latest LlamaIndex when used with OpenAI, where images are not being recognized; troubleshooting involved checking the latest LlamaIndex version and ensuring the use of a model supporting image inputs such as gpt-4-vision-preview.
- Proper configuration of the OpenAI LLM instance was also emphasized to resolve the issue.
- QueryFusion Retrieval Citation Woes: Using QueryFusionRetriever with a node post-processor fails to generate citation templates, unlike using index_retriever alone, as reported in this GitHub repo.
- The issue may arise from the BM25 retriever or query fusion retriever's reciprocal rerank, potentially leading to metadata loss during node de-duplication.
- Distributed AgentWorkflows Seek Native Support: A user inquired about native support for running AgentWorkflow in a distributed architecture, with agents on different servers or processes.
- It was suggested that AgentWorkflow is designed for single active agents, and achieving the desired setup might require equipping an agent with tools for remote service calls.
Yannick Kilcher Discord
- Bilevel Optimization Debated for Sparsemax: A debate arose around the applicability of bilevel optimization (BO) to Sparsemax, with one member arguing BO is a standard form equivalent to single-level optimization, while another suggested Sparsemax could be viewed as a BO.
- Discussion involved collapsing the hierarchy into single-levels to obtain closed forms, which works best when things are as simple as possible.
- Checkpoint Reloads Garbled with DDP: A member encountered issues where model checkpoint reloads were garbled on multiple GPUs when using PyTorch, DDP, and 4 GPUs, but worked perfectly on a single GPU.
- It was suggested that the order of initializing DDP and loading checkpoints matters: initialize the model, load checkpoints on all GPUs, then initialize DDP.
- Compositmax Introduced for Composite Arg Max: A member introduced Compositmax for composite arg max, noting that Softmax is the soft arg max, Sparsemax is the sparse arg max, and Entmax is the entropy arg max.
- The goal is to design new regularizers based on ideas using splines, aiming for faster performance than entmax.
- Proactive Agents Seek Image Intent: A new paper on Proactive Agents for Multi-Turn Text-to-Image Generation Under Uncertainty introduces proactive T2I agents that actively ask clarification questions when uncertain and present their understanding of user intent as an understandable belief graph.
- A Google TechTalk by Meera Hahn on proactive agents highlights that user prompts for generative AI models are often underspecified, leading to sub-optimal responses, as described in this YouTube video.
- Alibaba Qwen Releases QwQ-32B Model: Alibaba Qwen released QwQ-32B, a new reasoning model with only 32 billion parameters that rivals cutting-edge reasoning models like DeepSeek-R1 as mentioned in this tweet.
Eleuther Discord
- Suleiman Explores AI-Enabled Biohacking: Suleiman introduced themself, expressing a keen interest in developing AI-enabled biohacking tools to improve human health through nutrition and supplement science.
- Suleiman brings a background in software engineering and executive experience in a Saudi company.
- Machine Unlearning Ascends with Naveen: Naveen introduced themself and their research on Machine Unlearning in Text to Image Diffusion Models, having recently published a paper in CVPR25.
- Naveen is a Masters cum Research Assistant from IIT.
- ARC Training Attains 35% Accuracy: Members reported achieving 35% accuracy on ARC training using only inference-time examples, referencing a blog post by Isaac Liao and Albert Gu that questions whether efficient compression lies at the heart of intelligence.
- A member linked a paper on Relative Entropy Coding (REC), suggesting it as a main foundation for the lossless compression method discussed.
- Tuned Lens Trumps Logit Lens: Members discussed projecting intermediate layer outputs to vocab space, sharing Tuned Lens: Iterative Refinement with Interpretable Differentiable Probes that refines the logit lens technique.
- The recommendation was made to use the tuned lens instead of the logit lens, and the code needed to reproduce the results can be found on Github.
- vllm Faces Implementation Inquisition: A member reported a significant discrepancy in scores when running
lm_evalwith vllm on thedeepseek-ai/DeepSeek-R1-Distill-Llama-8Bmodel.- Another member proposed that the issue might arise from vllm's implementation and offered to investigate the samples if available.
Cohere Discord
- Aya Vision Extends Reach to 23 Languages: Cohere For AI introduced Aya Vision, an open-weights multilingual vision research model available in 8B and 32B versions, supporting 23 languages with advanced capabilities optimized for various vision-language use cases, as detailed in Cohere's blog post.
- The model is now on Hugging Face and Kaggle, and accessible on Poe, with users now able to interact with Aya for free on WhatsApp via this link from anywhere in 23 languages.
- Enterprise Support Response Times Face Scrutiny: A user, brad062677, expressed frustration over slow enterprise support response times, noting they had emailed support a week prior and were seeking a quicker resolution via Discord; the user was trying to connect with someone from the sales / enterprises support team.
- Other users pointed out that B2B lead times can stretch up to six weeks, contrasting with typical AI company response times of two to three days; a Cohere employee apologized and promised a response.
- Reranker v3.5 Latency Data Still Missing: Community members are seeking latency numbers for Cohere Reranker v3.5, initially hinted at in a Pinecone interview, but not yet released.
- The absence of concrete latency figures or a graph for Cohere Reranker v3.5 is causing some to actively seek out this information for performance assessment and comparison.
- Student Brainstorms Mindmap Project Approach: A student is developing a website that generates mindmaps from chapter content, aiming for a hierarchical structure of topics and subtopics, with plans to use either a pretrained model or create a custom mathematical model initially.
- The student is seeking guidance on the best approach to integrate both methods into their project and is looking for suggestions on the optimal starting point.
tinygrad (George Hotz) Discord
- ShapeTracker Merging Proof Nears Completion: A proof in Lean for merging ShapeTrackers is nearly complete, available in this repo with additional context in this issue.
- The proof currently omits offsets and masks, but extending it to include these factors is believed to be achievable with more effort.
- 96GB 4090 Spotted on Taobao: A 96GB 4090 was spotted for sale on Taobao (X post), sparking excitement about higher memory capacity for local training.
- Availability is still some months away.
- Rust CubeCL Quality Queried: Interest arose regarding the quality of Rust CubeCL, given it's created by the same team that works on Rust Burn.
- The member was wondering if Rust CubeCL was good.
- Clarification Sought on RANGE Op Operation: A member initially questioned the operation of the
RANGEOp, presuming its absence in theTensorimplementation ofarrange.- However, the member later cleared up their confusion, clarifying that it "isn't a range".
- iGPU Auto-Detection Questioned on Linux: A user questioned whether the default device initialization or
Device.get_available_devices()should automatically detect an iGPU on Linux.- Their post included an image that showed "Device: [CPU]", which the user did not expect.
Torchtune Discord
- TorchTune Copies the Original Special Tokens: The TorchTune checkpointer copies the original special_tokens.json from Hugging Face instead of a potentially modified, custom version, from the file here.
- The team decided against exposing new arguments without a strong reason, so the recommendation is to manually copy the file for now.
- Torchtune hits 5k GitHub Stars: The Torchtune project achieved 5,000 stars on GitHub.
- The community celebrated this achievement.
- GRPO Recipe Suffers from Empty Cache Overuse: A member inquired about the excessive use of
torch.cuda.empty_cache()calls in the GRPO recipe.- Another member admitted that many of these calls are likely excessive, stemming from early development when they faced memory issues.
- GRPO PRs Languishing: Two GRPO PRs, specifically #2422 and #2425, have been open for two weeks and are awaiting review.
- A member is requesting assistance in reviewing them, asking someone to help unload the queue.
LLM Agents (Berkeley MOOC) Discord
- MOOC lectures match Berkeley Lectures: A member inquired whether Berkeley students have exclusive lectures beyond the MOOC, and a colleague responded that Berkeley students and MOOC students attend the same lectures.
- There was no further commentary on the substance of the lectures.
- Certificate Award Delayed: A member reported submitting a certificate declaration form in December but received notice that there was no submission recorded.
- This issue was brought up in #mooc-questions with no further details, but this topic may indicate systematic problems with the MOOC.
Gorilla LLM (Berkeley Function Calling) Discord
- AST Metric Remains Mysterious: A member questioned the meaning of the AST (Abstract Syntax Tree) metric, specifically whether it measures the percentage of correctly formatted function calls generated by an LLM.
- The inquiry went unanswered in the channel.
- V1 Dataset Origins Unknown: A user asked about the construction of the V1 dataset.
- Like the query about the AST metric, this question also received no response.
- Python Tool Champion Still Undecided: A member sought recommendations for the best model for prompt tool calling, considering Gemini 2, GPT o3-high, and Deepseek R1.
- The specific use case involves calling a Python tool.
AI21 Labs (Jamba) Discord
- AI21 Labs Drops Jamba 1.6: AI21 Labs launched Jamba 1.6, an open model tailored for private enterprise deployment, with model weights available on Hugging Face.
- The company claims it delivers unmatched speed and performance, setting a new benchmark for enterprise AI without compromising efficiency, security and data privacy.
- Jamba 1.6 Shows Off Arena Prowess: Jamba 1.6 reportedly outperforms Cohere, Mistral, and Llama on the Arena Hard benchmark, rivaling leading closed models according to AI21's announcement.
- The release highlights its suitability for fully private on-prem or VPC deployment, boasting lightning-fast latency and a market-leading 256K context window.
- Hybrid Architecture Gives Jamba 1.6 Edge: The AI21 Jamba family features hybrid SSM-Transformer foundation models, excelling in both quality and speed, thanks to its novel Mamba-Transformer MoE architecture designed for cost and efficiency gains as explained in the Jamba 1.6 blogpost.
- The model is deployable anywhere, self-hosted, or in the AI21 SaaS, to meet diverse data security needs.
The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
PART 2: Detailed by-Channel summaries and links
The full channel by channel breakdowns have been truncated for email.
If you want the full breakdown, please visit the web version of this email: !
If you enjoyed AInews, please share with a friend! Thanks in advance!