[AINews] not much happened to end the year
This is AI News! an MVP of a service that goes thru all AI discords/Twitters/reddits and summarizes what people are talking about, so that you can keep up without the fatigue. Signing up here opts you in to the real thing when we launch it 🔜
a quiet new year's eve is all we need.
AI News for 12/30/2024-12/31/2024. We checked 7 subreddits, 433 Twitters and 32 Discords (215 channels, and 1948 messages) for you. Estimated reading time saved (at 200wpm): 238 minutes. You can now tag @smol_ai for AINews discussions!
In case you are lacking in "Year In Review" type content, you might enjoy the Latent.Space 2024 Year in Review and 2025 AI Engineer Reading List.
AInews ad slots are open for 2025! Email swyx@smol.ai cc will@diamondquarters.com to get your stuff in front of 30k AI Engineers daily.
The Table of Contents and Channel Summaries have been moved to the web version of this email: !
AI Twitter Recap
all recaps done by Claude 3.5 Sonnet, best of 4 runs.
AI Models and Research
- Reinforcement Fine-Tuning (RFT): @corbtt introduced Reinforcement Fine-Tuning (RFT) as a data-efficient method to enhance reasoning in LLMs. RFT allows models to learn from minimal training data by utilizing strategies like First-Correct Solutions (FCS) and Greedily Diverse Solutions (GDS), improving both outcome and process efficiency.
- DeepSeek-V3 and Open-Source LLMs: @tom_doerr presented DeepSeek-V3, a 671B parameter MoE language model trained on 14.8 trillion tokens with FP8 mixed precision training. Additionally, @cognitivecompai emphasized the significance of open-source LLMs like DeepSeek, highlighting their potential to scale inference and enhance accessibility.
AI Predictions and Trends
- AI in 2025: @alexalbert__ and @TheTuringPost shared comprehensive predictions for AI in 2025, covering areas such as benchmark scores, model advancements, industry dynamics, and the rise of agents. These predictions include the proliferation of smaller models, increased multimodality, and ongoing challenges in open-source AI.
- Impact of AI on Software Development Jobs: @svpino forecasted that AI will significantly raise the bar for software developers, necessitating higher intelligence and specialization to remain competitive. This trend is expected to decrease the number of developers over time as AI handles more low-skilled tasks, pushing professionals to upskill and adapt continuously.
AI Tools and Development
- CodeLLM Enhancements: @bindureddy announced updates to CodeLLM, including the ability to edit code in-place, streaming responses, and an unlimited quota on all SOTA models like CodeLLM, o1, and Sonnet 3.5. These enhancements aim to make the coding assistant more efficient and user-friendly.
- Natural Language Reinforcement Learning (NLRL): @TheTuringPost detailed the benefits of NLRL, such as better interpretability, richer textual feedback, and the enhancement of LLM’s planning and critique abilities. NLRL leverages natural language to make decisions and provide explanations, improving the stability and effectiveness of AI systems.
AI Industry and Employment
- AI Hiring Opportunities: @corbtt is expanding their team, seeking strong engineers in ML and systems. With a 40% month-over-month growth and a technical team of only 5, the company offers a chance to learn from a rapidly growing AI startup while making a significant impact in the industry. Interested candidates are encouraged to DM with impressive projects.
- AI Tools Launches and Integrations: @tom_doerr and others introduced various AI-powered tools like Rivet for real-time applications, Buzee for full-text search, and Konfig for generating SDKs and API documentation. These tools leverage technologies such as Rust, V8 isolates, and PostgreSQL to enhance developer workflows and application functionalities.
AI Policy, Ethics, and Society
- Regulatory Challenges and Partnerships: @DeepLearningAI discussed how tech giants are forming creative partnerships with AI startups as an alternative to acquisitions in response to increased regulatory scrutiny. This strategy aims to navigate regulatory challenges while continuing to innovate within the AI industry.
- AI Act and Competitive Concerns: @BrivaelLp advocated for the removal of the AI Act, arguing that regulatory constraints are hindering competitiveness in the AI sector. This stance reflects ongoing debates about the balance between regulation and innovation in the development of advanced AI technologies.
AI Reddit Recap
/r/LocalLlama Recap
Theme 1. DeepSeek V3: Hardware Requirements and Performance
- DeepSeek V3 running on llama.cpp wishes you a Happy New Year! (Score: 175, Comments: 51): The post highlights DeepSeek V3 running on llama.cpp, likely showcasing its performance capabilities, but lacks specific details or context about the implementation or results.
- Performance Metrics and Hardware Details: DeepSeek V3 achieves around 7-9 tokens per second (t/s) on an Epyc 9374F setup with 12x32GB RAM totaling 384GB. The model is quantized to Q4_K_M and occupies 377GB of disk space, with performance metrics varying based on memory location and prompt specifics.
- Implementation and Development: The model is not fully operational yet, as the developer is still working on implementing a new pre-tokenizer regex in llama.cpp. The regex is detailed as
"Regex": "[!\"#$%&'()*+,\\-./:;<=>?@\\[\\\\\\]^_\{|}~][A-Za-z]+|[^\r\n\p{L}\p{P}\p{S}]?[\p{L}\p{M}]+| ?[\p{P}\p{S}]+[\r\n]|\s[\r\n]+|\s+(?!\S)|\s+". - Community Engagement and Future Prospects: Users express enthusiasm for the project's progress and potential, with some predicting more economical models by 2025. Discussions also highlight the challenges and benefits of using regex in model development, with some users appreciating the capability of language models to generate regex patterns.
- Why there is not already like plenty 3rd party providers for DeepSeek V3? (Score: 63, Comments: 59): DeepSeek V3's state-of-the-art model is available for download and commercial use, yet there is a lack of third-party providers offering services with it. The author expresses willingness to pay a premium for prompt deletion of prompts by a trusted company and questions why other countries aren't utilizing unsanctioned access to top AI chips.
- DeepSeek V3's Size and Hosting Challenges: DeepSeek V3 is a massive model with over 600 billion parameters, making it challenging and costly for third-party providers to host. Many providers, like Together, have tried hosting it but faced issues like low throughput and profitability due to the model's size and the promotional pricing offered by DeepSeek themselves.
- Market Timing and Infrastructure Readiness: Discussions highlight that the holiday season may be affecting the availability of hosting services, with expectations that more providers will emerge as the new year progresses. The infrastructure for hosting large models like DeepSeek V3 is currently not optimized, impacting the speed and cost-effectiveness of hosting.
- Data Privacy Concerns and Pricing: There is a notable concern about data privacy, with some users willing to pay a premium to prevent their data from being used by DeepSeek for training. Additionally, DeepSeek's official API is praised for its price and speed, but the current promotional pricing makes it difficult for third-party providers to compete without incurring losses.
Theme 2. Alibaba's LLM Price Cuts: A Disruptive Move
- Alibaba slashes prices on large language models by up to 85% as China AI rivalry heats up (Score: 250, Comments: 95): Alibaba has significantly reduced prices on its large language models (LLMs) by up to 85%, reflecting intensifying competition in the Chinese AI market. This move is part of a broader trend of cost reductions among tech companies in response to growing rivalry in AI development.
- China's Green Energy and AI Advancements: Commenters highlighted China's leadership in green energy, noting it produces over 30% of the world's green energy and is on track to meet climate commitments six years early. China's focus on AI and electric vehicles (EVs) is supported by significant government subsidies and industrial synergies, making them competitive on price and innovation.
- Comparative Emissions and Industrial Capacity: Discussions emphasized the lower CO2 emissions per capita in China compared to the US, despite China's large industrial output. The US remains a major fossil fuel producer, whereas China is expanding its green energy capacity, including massive solar installations.
- AI and Technological Developments: China's advancements in AI, such as the development of Qwen and other LLMs, were noted, with some commenters interested in accessing these technologies in the West. The competitive landscape is driving down costs, with Qwen-VL-Plus priced at 0.0015 yuan per thousand tokens.
- Interesting DeepSeek behavior (Score: 118, Comments: 86): The post titled "Interesting DeepSeek behavior" lacks a body, providing no specific details or context about Alibaba or its impact on the global AI market.
- Discussions highlight censorship in AI models, with comparisons between Chinese and US companies. Commenters note that censorship is standard practice, with DeepSeek facing stricter regulations due to its location in China, while US models like ChatGPT also follow local laws and guidelines.
- Model behavior and censorship implementation are debated, with some users suggesting that models have auxiliary censorship mechanisms rather than altering base training data. This is observed in models like Gemini, which refuse to engage in certain topics, indicating usage of a guard model to manage sensitive content.
- The conversation touches on the economic and technical feasibility of filtering training data to avoid unwanted content. One user argues that excluding specific content from training sets could be more effective, while another points out that doing so at scale is computationally expensive, and models benefit from exposure to both positive and negative samples for improved alignment and steerability.
Theme 3. Qwen: The Preferred LLM for Varied Applications
- What's your primary local LLM at the end of 2024? (Score: 285, Comments: 185): Qwen2.5 32B is highlighted as the author's primary local LLM due to its optimal performance on 24GB GPUs, even three months post-release. The author seeks community input on their favorite local LLM choices by the end of the year.
- Qwen Models: Many users favor Qwen2.5 models for various tasks, with notable mentions of Qwen2.5-32B for general use and Qwen2.5-Coder 32B for coding. Some users also prefer the larger Qwen2.5-72B for programming, though it's noted to be slower on some hardware configurations.
- Alternatives and Comparisons: Models like Mistral Large 2411 and Gemma 2 series are frequently used for general purposes and creative tasks, with some users comparing Mistral Large favorably against newer models. Llama series, particularly Llama 3.1 and Llama 3.3, are also popular for their versatility in creative writing and general tasks.
- Technical Preferences: Users discuss the trade-offs between model size, quantization levels (e.g., Q4, Q5, Q6), and performance, with some opting for smaller models like Gemma-2-9b for budget-friendly performance. There is also interest in specific use cases like coding, with models like Deepseek v3 noted for their superior performance in answering specific coding questions.
Theme 4. DeepSeek in 2024: Influence and Market Penetration
- What would you like to see in Unsloth for 2025? (Score: 55, Comments: 108): Unsloth developers express gratitude for community support and seek user input on future features for 2025. They invite suggestions for ambitious or minor updates, such as Diffusion/Whisper support, Unsloth RAG, or Apple compatibility, and ask for feedback on current functionality, missing features, usability, and documentation needs.
- Users express a strong desire for UI improvements to simplify model fine-tuning and management, with suggestions for a Gradio-based UI to enhance usability for beginners and streamline dataset handling. Apple/Mac support is also a popular request, allowing local training on MacBook Pros.
- Technical requests include full-finetuning support for models under 10B, distributed training across multiple GPUs, and AWQ conversion and finetuning capabilities. Users find the current process time-consuming, with one user mentioning an 8-hour conversion time for Llama 3.3 70B models.
- There is a focus on creating cost-effective datasets and training parameters for smarter reasoning models, particularly for those with limited GPU resources. The community appreciates the existing support for AMD and Intel GPUs and anticipates the upcoming multi-GPU support, expected to be open-sourced early next year.
Other AI Subreddit Recap
/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT
Theme 1. Deepseek Versus OpenAI 01: Disputed Claims and Community Reactions
- Deepseek claims they beat OpenAI's 01 model on multiple reasoning benchmarks (Score: 109, Comments: 89): Deepseek, a Chinese AI startup, claims their latest R1 model outperformed OpenAI's 01 on multiple reasoning benchmarks, as reported on Hacker News. The post raises skepticism about whether this is a genuine achievement or a publicity stunt. Further details can be found in the article linked here.
- Skepticism and Criticism: There is significant skepticism about Deepseek's R1 model, with many commenters doubting its superiority over OpenAI's 01. Users like flysnowbigbig and FranklinLundy criticize the model's performance and credibility, suggesting it might be an attention grab or a copy of Western models without real innovation.
- Open Source vs. Proprietary Models: Some commenters, such as SonOfThomasWayne and informationWarfare42, argue the benefits of open-source AI models like Deepseek, emphasizing that open weights can democratize AI development, unlike closed models like those from OpenAI.
- Geopolitical Concerns: The discussion includes concerns about China's AI development strategy, with HateMakinSNs and iperson4213 expressing worries about China's potential dominance in AI through copying and cost-cutting, which could have global implications, including control over essential technologies and resources.
Theme 2. RAG for Email Knowledge Retention: Privacy Concerns & Implementations
- RAG a 40GB Outlook inbox - Long term Staff member leaving, keeping knowledge (theory) (Score: 113, Comments: 79): The post discusses the concept of using Retrieval-Augmented Generation (RAG) to preserve corporate knowledge from a 40GB Outlook inbox belonging to a long-term employee. The author envisions creating a database from the inbox using a local LLM and an open web UI, which could then be given to Hugging Face to manage queries and suggest responses based on historical communication data.
- Privacy and Legal Concerns: Several commenters, including GamleRosander and -Akos-, highlight potential privacy issues and legal constraints, particularly under GDPR regulations in the EU, which could prohibit indexing personal email data without consent from all involved parties. -Akos- also points out the risks of data exposure to external parties like Hugging Face.
- Technical Implementations and Alternatives: edemmeister describes a successful implementation of a RAG app using an embeddings model and LLM hosted on-premises, which handles various data sources and automates help desk responses. SpecialistCobbler206 suggests creating condensed versions of emails to maintain privacy while still building a useful knowledge graph.
- Data Accuracy and Relevance: Fast-Satisfaction482 raises concerns about the evolving nature of information, where past correct answers may become incorrect over time, suggesting that a temporal graph RAG could be more effective than a static database.
AI Discord Recap
A summary of Summaries of Summaries by o1-mini-2024-09-12
Theme 1. AI Model Performance Battles Intensify
- DeepSeek vs. Claude: Who Tops the Code Game?: DeepSeek Coder V2 Lite consistently outperforms older models like Sonnet, securing a 960.01 score on WebDev Arena’s leaderboard, while Claude 3.5 Sonnet leads with 1218.58 points, sparking fierce competition among Gemini, GPT-4o, and Qwen models.
- Steiner Reasoning Model Shines in LM Studio: Users discovered the Steiner reasoning model, which outperforms larger counterparts like Llama 3.3 Q4 70B in specific reasoning tasks, highlighting its advanced logic capabilities within LM Studio.
- ModernBERT Embeddings Enhance LocalDocs Performance: The introduction of modernbert-embed-base offers improved tokenizer and faster inference for LocalDocs, providing a robust backend for text analysis and retrieval tasks.
Theme 2. AI Tools and Platform Enhancements
- Codeium’s Windsurf Struggles with Credits and Wait Times: Users face issues with User Prompt credits not being received after purchase and lengthy Windsurf response times exceeding 20 minutes, leading to increased support ticket filings for resolution.
- LM Studio’s Steiner Model Surpasses Expectations: The Steiner reasoning model integrated into LM Studio showcases superior performance in reasoning tasks, outperforming larger models and attracting attention for its efficiency and advanced logic.
- OpenAI’s API Discussions and Prompt Engineering Frustrations: Community members debate the effectiveness of direct prompts, express dissatisfaction with limited markdown support, and explore tools like LexiDeck for multi-agent interactions, aiming to streamline feature research and improve prompt engineering practices.
Theme 3. Data Privacy and AI Ethics Concerns
- Codeium Users Debate Data Privacy and AI Ethics: Members express skepticism about using proprietary AI tools on sensitive code, weighing the benefits of advanced AI suggestions against potential data snooping, with a preference for open-source solutions to ensure data safety.
- Nous Research AI Highlights Therapy Tech Tangles: Discussions focus on AI in therapy, emphasizing the risks of data breaches and the challenges of maintaining patient confidentiality, especially after the 2022 NHS IT firm hack.
- Stability.ai Discord Urges Scam Prevention Measures: Members call for enhanced security measures like phone verification and captchas to combat recurring scam attempts, stressing the importance of safeguarding the community from identity theft and data harvesting.
Theme 4. Hardware and GPU Optimization Strategies
- Groq’s LPU Inference Engine Sets New AI Speed Records: The Groq LPU Inference Engine achieves 241 tokens per second, challenging traditional GPUs and raising discussions about system RAM versus specialized hardware like Cerebras WSE-3.
- Raspberry Pi 5 Tests Highlight GPU Limitations: Trials with llama.cpp on Raspberry Pi 5 reveal challenges in compiling for the Vulkan backend on VideoCore VII, with the Bielik-1.5B model achieving around 7 tok/sec, emphasizing the need for higher power accelerators for broader LLM workloads.
- CUDA Overlaps and Triton Performance Tweaks: Community members delve into optimizing CUDA data transfers to reduce GPU run times from 15 seconds to near 1 second, while also addressing Triton’s underperformance issues by disabling the
TRITON_INTERPRETenvironment variable.
Theme 5. Technical Issues and Community Support Challenges
- Subscription Woes in Codeium’s Windsurf Editor: Users report unexpected downgrades from Pro Ultimate to free plans and delays in receiving purchased flex credits, prompting urgent support ticket submissions and community frustration over reliability.
- Aider’s Command Execution and Token Limit Confusion: Members face challenges with Aider’s command execution requiring manual approvals despite settings, and encounter persistent token limit errors, leading to requests for clearer guidance and prompt management strategies.
- OpenRouter’s Model Integration Hurdles: Users struggle to add custom models to OpenRouter, suspecting restrictions to well-funded providers, while others explore personal hosting as a workaround, highlighting the need for better support and documentation for smaller developers.
PART 1: High level Discord summaries
Codeium (Windsurf) Discord
- Credit Chaos & Sub Woes: In Codeium (Windsurf) discussions, users struggled with User Prompt credits, with one saying “I paid $10 for flex credits four days ago, but never got them.”
- Others reported sudden downgrades from Pro Ultimate to free, prompting advice to file support tickets for quick resolutions.
- Windsurf Wait Times Wear Thin: Some found Windsurf sluggish, facing waits of over 20 minutes between prompts, even on paid plans.
- Folks demanded faster responses and smarter guardrails, hoping to reduce misfires and keep coding stress-free.
- WSL Worries & Linux Love: Developers complained about Windows Subsystem for Linux (WSL) reliability, citing code execution snags and annoying setup steps.
- Many championed a direct Linux installation to get around these pitfalls, preferring fewer hiccups with debugging.
- Web-Crawling Wish & Repo Workarounds: Users clamored for Windsurf to support web crawling and direct repository ingestion, hoping for a swift rollout.
- Until then, a member suggested Gitingest to convert Git repos into text for improved LLM integration.
- Data Privacy & Ethics Debate: Participants questioned the safety of using proprietary AI tools on sensitive code, voicing reluctance to trust closed systems.
- They weighed the benefits of advanced AI suggestions against potential snooping, with some preferring open-source for peace of mind.
Nous Research AI Discord
- Therapy Tech Tangle & Privacy Perils: Team members dissected AI in therapy usage, highlighting a 2022 breach mentioned in Watchdog set to fine NHS IT firm after medical records hack that revealed vulnerabilities in patient confidentiality.
- They concluded that anonymized data can still expose identities if unique patterns are processed by sophisticated models, fueling deeper concerns over healthcare data handling.
- Claude's Code Craze & Complexity Conundrums: Enthusiasts shared attempts to generate concise code with Claude 3.5 Sonnet and Haiku, revealing varied token savings and modest success with more involved tasks.
- They debated whether compact outputs hamper long-term readability, citing persistent tension between code brevity and maintainability.
- Hermes 3 Quirk & Amnesia Emulation: One user pursued replicating Amnesia with Hermes 3 (non 405b version) to use deliberate forgetting, believing removing the prompt might simulate the effect.
- Others joked that a “blank slate” approach is the simplest path, though they acknowledged that deeper code tweaks might be required to ensure consistent memory resets.
- Backprop-Free Breakthroughs & MCU Magic: Participants cited two papers, Gradients without Backpropagation and Poor Man's Training on MCUs: A Memory-Efficient Quantized Back-Propagation-Free Approach, which explore non-backprop methods and cutting-edge optimization.
- These references sparked conversation about resource-light training on microcontrollers, illustrating the feasibility of advanced AI without standard gradient methods.
OpenAI Discord
- Gemini Gains and Discord Drains: Users criticized OpenAI for minimal Discord engagement, while Gemini 2 Flash showcased real-time search and spurred competition talk.
- One participant noted spending $130 monthly on AI APIs, signaling a push for more efficient usage and cost control.
- Moderation Maneuvers and GPT-4o Quirks: Community members encountered content moderation snags, especially with sensitive topics related to minors, prompting some to disable filters entirely.
- Others raised concerns about GPT-4o character consistency and the absence of image generation features, sparking disappointment.
- Scripts Soar & Coders Grow: A user improved a cinematic script with community help, crediting a Discord exchange for smoother movement and structure.
- New coders boosted their skills through group debugging, praising feedback for increasing their confidence.
- Prompts, Markdown, and LexiDeck: Contributors championed concise prompts to guide ChatGPT while bemoaning limited markdown support in Discord for sharing examples.
- A tool called LexiDeck emerged as a multi-agent framework for ChatGPT, though it currently lacks canvas functionality.
LM Studio Discord
- LM Studio’s Missing Canvas: One user asked about generating images in LM Studio, but it's currently not supported.
- Another user reported a macOS permission prompt while updating to v0.3.5 (build 2), which was attributed to the 'Squirrel' updater.
- Steiner Surprises Bigger Models: A user discovered the Steiner reasoning model on Hugging Face, claiming it surpasses larger models in reasoning tasks within LM Studio.
- They noted it outperforms Llama 3.3 Q4 70B in select scenarios, drawing attention for advanced logic use cases.
- Coral Conundrum: Llama 3.2 on 16W: Members discussed Llama 3.2 1b with its <2GB model size potentially running on Coral.ai TPUs limited to 16 watts.
- They concluded that TPUs may struggle with broader LLM workloads, prompting consideration of accelerators with higher power capacity.
- Groq Gains Speed at 241 TPS: The Groq LPU Inference Engine drew praise for pushing 241 tokens per second, piquing interest in its performance and pricing.
- A benchmark report revealed impressive throughput, raising questions about system RAM vs. hardware like Cerebras WSE-3.
- MacBook Pro: RAM vs. CPU Only: Some argued that moving from a 16GB to a 32GB MacBook Pro offers minimal gains in LLM speed, especially for writing tasks.
- Others recommended up to 128GB if budgets allow, though many agreed that CPU-only setups can still lag behind specialized hardware.
aider (Paul Gauthier) Discord
- DeepSeek Dominates & Model Limitations: Community members praised DeepSeek for outperforming older models like Sonnet, citing speed improvements and resolution of competitor issues.
- The model also ranked 960.01 on WebDev Arena’s leaderboard, fueling excitement over future enhancements.
- O1 API Access Confusions: Participants discussed inconsistent availability of O1 and o1-preview across organizations, prompting questions about the current access criteria.
- They requested official clarification, underscoring a growing interest in using O1 for advanced tasks.
- Aider Workflow & Command Execution Quirks: Some users reported challenges with Aider's command execution, noting that direct shell commands still needed manual approval, even with
AIDER_YES_ALWAYSset.- Confusion arose around token limit errors, leading to suggestions to consult
/tokensfor deeper insights on context usage.
- Confusion arose around token limit errors, leading to suggestions to consult
- Model Switching & File-Based Prompts: Engineers explored methods to easily swap between deepseek for editing and o1 for heavier tasks, considering scripts or smart commands.
- Others inquired about saving prompts in dedicated files for quick reuse, seeing potential synergy with solutions like clinerules.
- WebDev Arena Sparks Fierce AI Competition: The newly launched WebDev Arena dares participants to craft top-tier websites, with Claude 3.5 Sonnet earning a leading score of 1218.58.
- High-scoring contenders like Gemini-2.0-Flash-Thinking-1219 and GPT-4o-2024-11-20 underscore the rivalry, while the live leaderboard encourages sustained community engagement.
Unsloth AI (Daniel Han) Discord
- Unsloth's Unified Hymba Hustle: Engineers shared strategies for combining two LLMs in the Unsloth pipeline and discussed the Hymba-1.5B-Instruct model for advanced tasks despite support hiccups.
- Some highlighted fine-tuning best practices, while others noted potential compatibility issues for efficient Unsloth usage.
- Fine-tuning LLaMA 3 Without the Fluff: A user shared a tutorial on optimizing LLaMA 3 within Ollama, guiding folks to build a local personal assistant.
- The community applauded Unsloth's creators for this well-structured tutorial, praising the improved design and references.
- TTT Trips Up ARC Tasks: Discussions around Test Time Training (TTT) revealed significant gains on the ARC dataset, showing a 6x accuracy jump in some cases.
- A paper was cited, prompting questions on code availability and enabling deeper scrutiny of TTT methods.
- Feedback Frenzy & Friendly Discord: Members praised the Discord framework, voicing gratitude for the server's positive atmosphere and cohesive approach.
- They also requested fresh features for Unsloth in 2025, underscoring collaboration and open input from everyone.
Stackblitz (Bolt.new) Discord
- Token Tussle Over Bolt Costs: A user reported they used 30 million tokens in two days while employing ChatGPT and the Bolt prompt enhancer, flagging serious cost implications.
- They cautioned the community to manage monthly credits more deliberately, avoiding unnecessary expenditures for minor code tweaks.
- Reload Ruckus within Projects: Multiple contributors debated whether reloading a Bolt project should rely on a browser refresh or a specialized button, with some leaning on AI-based page-specific fixes.
- They highlighted that code-lifting solutions like Claude streamlined iterative deployment by focusing on narrow segments of code.
- Bolt Pro Subscription Confusion: Members confirmed Bolt Pro offers tokens on a monthly basis, clarifying uncertainties about daily versus monthly caps.
- They also discussed the platform’s usage limits, lamenting the lack of official Bolt support and depending heavily on community insights.
- Facebook API Frustrations: Enthusiasts attempted to weave the Facebook Marketing API into Bolt, accruing hefty token charges with limited success.
- One user managed to sync some data but struggled with advanced permission requests, lacking direct assistance from Bolt’s side.
- Table Data & AI Tool Trials: Members looked at .csv formats for smooth data imports in Bolt prompts, aiming to streamline table handling.
- They also recounted hit-or-miss outcomes using AI tools for coding, noting that more intricate builds required significant manual intervention.
Cursor IDE Discord
- DeepSeek v3 Gains Ground: Community members tested DeepSeek v3 within Cursor, praising its speed with large databases and complex queries.
- They compared it to other models, highlighting surprising availability while some sought clarity on licensing specifics.
- Hosting Hustle: Quick Picks: Enthusiasts debated Hetzner and Digital Ocean for affordability and straightforward setup.
- Others praised Vercel plus AWS synergy, citing Docker skills as an advantage for robust deployment.
- Next.js Chatbot Craze: Community members shared references to building chatbots with Next.js and shadcn, pointing to vercel/ai-chatbot for a hackable approach.
- They recommended adding an API key and following setup instructions, also referencing modals-next-test for TypeScript-based modals.
- GitHub Models Fuel AI Engineering: A new update from GitHub introduced advanced AI tooling under GitHub Models, spotlighted in this official blog post.
- Users expressed excitement over potential benefits for AI developers and the shift toward freely available models via GitHub’s marketplace.
OpenRouter (Alex Atallah) Discord
- OpenRouter’s On-Ramp for New Models: A user asked about adding their model to OpenRouter, suspecting it might only work with heavily financed providers, and others encouraged starting a personal hosting approach.
- Contributors pointed out that Not Diamond is another multi-model router, suggesting that small-time developers can still test the waters.
- DeepSeek v3 Delivers Gains: Many praised DeepSeek v3 for consistent credit usage and steadiness, particularly when compared to pricier alternatives like Claude.
- Some insisted it remains effective for narrower tasks, noting cost-to-performance tradeoffs.
- Gemini 2.0 Hits a Snag with NSFW: Users reported that Gemini 2.0 Flash struggles with NSFW image captioning, calling it unusable on OpenRouter.
- They also cited performance troubles and tight context limits that hamper advanced image analysis.
- Sonnet vs. DeepSeek: A Competitive Chorus: Participants compared Sonnet and DeepSeek, with Sonnet favored for instruction-following and complex queries.
- Critics argued that DeepSeek falls short on advanced programming tasks, even though it’s cheaper.
- Self-Moderated Models Spark Debate: One participant asked how self-moderation works, prompting clarifications on how refusal messages trigger when terms of service are broken.
- Some referenced Building effective agents to illustrate compliance strategies, highlighting the role of provider-specific guidelines.
Interconnects (Nathan Lambert) Discord
- Model Self-Evaluation: Magic or Myth?: Members questioned why o1/o3-like models appear effective at self-evaluation, discussing that they might not truly recognize their own limits and suspect sampling methods drive these claims.
- Others noted reinforcement learning’s path-dependent nature, suggesting self-correction is not a core factor in outcome quality.
- Nvidia’s $700M Run:ai Deal: Nvidia acquired Run:ai for about $700 million, boosting GPU scheduling in AI workloads.
- They plan to open source Run:ai’s tools, sparking questions about how this move will reshape enterprise GPU orchestration.
- Gary Marcus Stirs the Pot: Critics called out Gary Marcus for rarely adjusting his stances, while still acknowledging some of his concepts.
- He and others debated the real progress on GPT-4 and hallucinations, reflecting skepticism toward near-term large-scale improvements.
- Insights from 2024 Interconnects Year in Review: Nathan Lambert recapped two years of AI developments, highlighting RLHF and open-source, plus anticipation for OpenAI’s o1 model.
- He also commented that Meta may not gain a clear advantage from AI alone, cautioning that ever-expanding model sizes might outstrip present hardware.
- Short Tweets & The Snail’s Comeback: A discussion on social media revealed that quick, offhand posts such as 'we are so back' often draw unexpected engagement.
- Lambert joked that these throwaway lines can spark overblown reactions, culminating in the playful 'Return of the Snail' meme.
Notebook LM Discord Discord
- Google's Guarded Gemini Gains Grit: A user observed that Google AI is stricter on sensitive topics compared to open-source LLMs, citing Gemini as a prime example, with references to Google's Vertex AI docs.
- Others noted that such caution could hamper advanced uses in healthcare or legal domains, describing it as both a safety measure and a nuisance.
- Podcast's Perpetual Loading Predicament: A user found the Notebook LM podcast generator stuck at 'Generating conversation,' raising concerns about possible performance bottlenecks, with references to NotebookLM docs.
- Participants recommended verifying data inputs like bus routes, but no official patch or workaround was confirmed.
- NotebookLM's Next-Level Plus Perks: NotebookLM Plus expands resource usage and integrates with Gemini Business, referencing this upgrade guide and letting users embed PDFs and YouTube links.
- However, users reported no bulk YouTube video upload yet, leaving them to insert each link separately.
- Voice Variation Vexations: Members critiqued the voice model for inconsistent multilingual performance, referencing Cloud TTS docs for potential solutions.
- They hope for a 2025 improvement that addresses tonal stability and cross-language transitions.
- UI Upgrades Under Urgency: Some found the new NotebookLM interface too cramped, describing it as claustrophobic and wanting more screen space.
- Community feedback calls for advanced layout options, though no official design roadmap was cited.
GPU MODE Discord
- CUDA Overlaps & HPC Gains: Members explored overlap data transfers in CUDA and fluid simulation tweaks, referencing fluidsCUDA.
- They aim to reduce the 15-second GPU run to something closer to the 1-second OPENMP speed by optimizing memory usage.
- Genesis Simulator's Myth Busted: A new blog revealed Genesis is up to 10x slower than older GPU sims, shattering the earlier 430,000x faster claims.
- This explanation aligns with Stone Tao's tweet noting that previous metrics were mostly static environment tests.
- Triton Perf Twists & Kernel Tips: Triton underperformed compared to Torch for vector-add until users found TRITON_INTERPRET=1 was causing a slowdown.
- They also debated integer arithmetic limits and whether manual kernel tuning can surpass Triton's auto-scheduling logic.
- Raspberry Pi 5 LLM Tests & Speed Limits: Trials with llama.cpp on the Raspberry Pi 5, featuring VideoCore VII, hit compilation roadblocks for the Vulkan backend.
- Meanwhile, the Bielik-1.5B model hovered around 7 tok/sec, and OpenBLAS slowed input decoding instead of improving output speed.
- New GPU Openings & HPC Hustle: A Cracked Research Engineer position appeared for those interested in advanced GPU projects.
- Members also hunted for SF-based CUDA engineer roles, remote LLM infra gigs, and Triton kernel dev options.
Perplexity AI Discord
- Pro Reasoning & Deepseek Surprises: Members noted that Perplexity's Pro Reasoning Mode automatically kicks in for tough queries, bolstering the AI's internal analysis, while Deepseek functions under different regulations.
- Participants wondered how Chinese rules might grant Deepseek more flexibility, prompting questions about how laws impact output.
- OpenAI Contemplates PBC Path: Contributors discussed OpenAI moving toward a Public Benefit Corporation model, aiming to balance profit with social goals.
- They viewed the shift as a direct response to debates on accountability, referencing arguments for broader responsibility in commercial AI.
- Sonar Models & Perplexity API in the Spotlight: Members clarified that Sonar models excel at real-time web answers complete with citations, suggesting caution against distributing them elsewhere.
- Others explored how the Perplexity AI API might integrate into future apps, highlighting potential for enhanced AI-driven projects.
- Discord Bots Enter Premium Territory: A user hoped to create a Discord bot using premium perks from Perplexity AI, eyeing advanced functionalities for chat experiences.
- They planned to bundle those benefits into more dynamic interactions, expecting direct synergy with the API.
- Random Videos & Optimization Buzz: Attendees evaluated YouTube's Random Video Button to see if it improves viewer engagement.
- They also pointed to content optimization tips, placing emphasis on strong keywords and audience insights.
Modular (Mojo 🔥) Discord
- Pointer Pivot: Switching to OwnedPointer: Mojo devs replaced BoxPointer[Self] with OwnedPointer[Self], catching some off-guard because the older name disappeared in nightly builds. They emphasized safer pointer usage to match Mojo's stricter invariants around references and ownership.
- Feedback showed that some participants initially struggled to locate the new pointer type, leading them to request clearer references in the documentation. The rename was hailed as an improvement to Mojo's pointer story, though advanced pointer patterns still felt tricky.
- Self-Referential Saga: ArcPointer Steps Up: Mojo enthusiasts tested ArcPointer for shared references in chain-like data structures, discovering that optional references often demanded structural overhauls. They debated whether to rely on
ArcPointeror reorganize the code to avoid self-referential pitfalls.- Some users noted that UnsafePointer could introduce risks if used incorrectly. Others advised adopting alternative designs for more predictable ownership patterns and clearer lifecycle rules.
- Breaking Changes Boogie: Mojo's 6-Month Rewrite Cycle: Mojo maintainers confirmed that compatibility will shift about every six months until version 1.0, prompting worries about repeated rewrites. Users expressed concerns over code stability, with some considering Rust as a backup plan.
- Some participants embraced the changes, arguing it fosters rapid iteration and refinement before Mojo stabilizes. Others suggested waiting for a nearer v1.0 milestone to avoid too many transition headaches.
- Turning 'max' Up a Notch: API Modernization: Participants observed that Mojo's 'max' function relies on older semantics and lacks robust safe references. They recommended a thorough API review to adopt refined pointer usage and advanced type features.
- Footguns in the current setup could be fixed by better leveraging value semantics and move semantics. Calls for a more streamlined approach highlight Mojo's ambition to tighten up its core libraries.
Stability.ai (Stable Diffusion) Discord
- Discord Dilemma: Scam Scrutiny!: Members called out recurring scam attempts in Discord, urging phone verification or captchas to deter malicious actors, referencing how attackers keep reappearing. They noted that phone verification, while not perfect, would increase the cost of each scam attempt.
- Some described it as bot whack-a-mole, arguing that concerns about trust and safety overshadow the real hazards like identity theft and data harvesting. The group recommended urgent methods to keep the space safe from infiltration.
- SD3 Safety Showdown: A few participants debated the trust and safety aspects of SD3, with some wanting these measures extended to the community's chat environment. They argued that safety rhetoric often diverts attention from pressing infiltration attempts.
- One user stated these strategies take focus away from scamming, revealing a mismatch between product marketing gestures and real security. Another user contended the discussion is overshadowed by persistent infiltration that burdens the community.
- Faceswap Fiasco in Stability.ai: A user asked about faceswap functions in the Stability.ai API, seeking details missing in official docs. They learned that while image manipulations exist, robust temporal consistency for faceswap is lacking.
- Respondents highlighted the library’s limitations, indicating it is not yet a one-stop solution for advanced facial reconstruction. They suggested evaluating third-party tools with more reliable facial alignment.
- LoRA vs. Checkpoint Conundrum: LoRA updates focus on localized parameters, whereas fully fine-tuned checkpoints typically involve bigger changes at the expense of disk usage. Members concluded both approaches can yield similar gains, but LoRA is often more resource-friendly.
- Some argued that fully updating checkpoints is best for major transformations, but others found LoRA ideal for moderate improvements. This balance between size and capability made LoRA appealing to those with limited GPU overhead.
- Newcomers Tackle Model Tinkering!: New users introduced themselves, seeking tips on prompt design and model building. A few felt lost about checkpoint creation and yearned for advice from those with advanced experience.
- Veterans welcomed them, suggesting LoRA or partial fine-tuning as efficient ways to refine models without massive overhead. They also shared tried-and-true tricks for iterative improvement.
Eleuther Discord
- Tanh-Powered RMSNorm Sparks Chatter: A new Lipschitz-1 RMSNorm variant using tanh to maintain input 2-norm drew attention for its potential in GANs and residual models.
- Skeptics worried it might hamper normal models but agreed that a strict Lipschitz bound is vital for stable residual flows.
- Pile Dataset’s 260B Token Revelation: A discussion pinpointed this paper confirming about 260B GPT-2 tokens in ~825.18 GiB of the Pile dataset, upsampled at times to ~400B tokens.
- Participants dissected the gap between actual and estimated token counts to fine-tune training setups.
- Neural SDFs & NeRFs Get Lipschitz Buzz: Members highlighted how a Lipschitz bound can speed up network tracing in neural SDFs and NeRFs.
- They linked these gains back to the RMSNorm approach and saw promising performance improvements.
LlamaIndex Discord
- RAG Gains Steam With LlamaParse Auto-Mode: Hanane Dupouy showcased how an Optimized RAG pipeline uses LlamaParse auto-mode to balance cost and performance for financial reports.
- Members highlighted cost efficacy and real-time toggling as key benefits, fueling discussion about improved data handling.
- Anomaly Detection in a Milvus + FAISS Mashup: A user shared a hybrid approach for anomaly detection, combining Milvus and FAISS to handle embeddings and clustering.
- Others suggested using the direct Milvus client to sidestep memory constraints, noting that some vector stores skip storing embeddings.
- Chatbot Concurrency Conundrum: Challenges arose with multiprocess-based delays for a long-running background task, leading to a debate on managing concurrency in chatbots.
- Community members recommended asyncio.create_task for asynchronous operations, citing leaner flow control and quicker responses.
- Finetuning Llama? Some Curiosity, No Concrete Steps: Hints about Finetuning a Llama Model surfaced, but specifics remained limited to brief mention.
- Dev enthusiasm spiked around possible expansions, though no further instructions or code were provided.
Latent Space Discord
- ModernBERT Finetunes Flood the Scene: A new ModernBERT embedding model called
modernbert-embed-baselanded with improved tokenizer and faster inference, as described in Zach Nussbaum's post. It was trained on public Nomic Embed datasets, offering an alternative approach to embedding generation.- Some members admired the visual representations shared on Twitter, citing ModernBERT as a solid step in refined large-scale embeddings (LSE).
- Arc AGI Chart Reaffirms AI Momentum: A progress plot shared by Douwe Kiela confirmed that AI development shows no sign of slowing, referencing the original Dynabench paper. This chart highlighted continuous leaps in model performance across multiple benchmarks.
- Members pointed out that the chart serves as a reminder of the speed at which breakthroughs keep materializing, urging everyone to keep track of AGI trends.
- OpenAI’s For-Profit Pivot Sparks Debate: Jan Leike questioned OpenAI’s pivot to a for-profit entity, suggesting it undercuts its nonprofit vision. Critics lamented that the original mission to benefit humanity is now overshadowed by corporate goals.
- Some participants argued this move was inevitable, while others hoped the nonprofit side will still champion ethical AI ideals.
- Hugging Face’s Agentic Systems Hit the Stage: Aymeric announced a new agentic systems library dubbed
smolagents, touted as the ‘simplest library’ for building powerful agents. It focuses on minimal code overhead and natural code-writing capabilities, distinguishing itself from conventional toolkits.- The community welcomed this approach, seeing potential for straightforward agent assembly and rapid prototyping in modern AI workflows.
- ts_zip Offers Experimental LLM Compression: A new LLM-driven compression tool called ts_zip emerged with bold claims of higher compression for text files, as seen on the project page. It relies on GPU acceleration and can be noticeably slower than standard compressors.
- Enthusiasts were eager to test its early-stage benefits, while acknowledging its experimental status and potential pitfalls.
Cohere Discord
- Tokenization Treads Familiar Territory for HMM: A member confirmed tokenization remains unchanged for Hidden Markov Models (HMM), referencing consistency with earlier frameworks in 2022.
- They noted stable performance under these methods, with no modifications needed for HMM scripts, suggesting well-established best practices stay effective.
- New Year Cheers with Minimal Tech Surprises: Multiple members exchanged New Year greetings, signaling a short break from in-depth topics.
- They paused advanced discussion in favor of celebrating, leaving no further progress updates or new releases mentioned.
tinygrad (George Hotz) Discord
- Reversible Riddles in Tinygrad: One user asked if an intermediate assembly step or a direct uop-to-binary path is necessary for machine code in a system of reversible transformations, questioning how it aligns with final rewritten states.
- They also probed whether each transformation translates into a uop sequence or an eventual one-to-one mapping, creating intrigue around how tinygrad might approach full reversibility.
- pcode Gains Ground in Tinygrad: Community members praised the sleigh documentation, highlighting shared ideas between pcode translation and the uop method in tinygrad.
- They noted that pcode definitions handle dtype and meta data in a style akin to assembly, prompting speculation on how to fold these concepts into tinygrad.
- Newcomer Guides and Internals Intro: A user sought beginner-friendly tasks beyond 'good first issue,' prompting references to tinygrad-notes for step-by-step help on tinygrad fundamentals.
- Contributors also shared a new introduction to tinygrad's internals, calling for further learning material and community contributions.
Axolotl AI Discord
- GH200 Access Sparks Debug Drive: A member asked for GH200 access to run a Python reproducer and verify the D2H memory transfer configuration.
- They want to ensure the issue is not caused by local setup quirks and to confirm consistent behavior across systems.
- D2H Memory Transfer Stirs Concern: The chat pointed to a potential D2H memory transfer glitch that may arise from specific configurations.
- They emphasized cross-checking setups to rule out unintended device or driver mismatches as sources of the problem.
Nomic.ai (GPT4All) Discord
- DeepSeek Steady, GigaChat Untapped: One member reported that DeepSeek Coder V2 Lite performed reliably, showing consistent outcomes for code tasks. They did not attempt GigaChat, leaving the model’s capabilities unexplored.
- No benchmark data was presented, but there is curiosity about GigaChat's functionality in future trials.
- Modernbert Mention & Localdocs Embeddings: A participant saw Modernbert on Hugging Face, raising questions about enhancing the embedding backend for localdocs. They suggested these updates could boost text analysis or retrieval tasks.
- This reflects the community’s focus on evolving embedding approaches, anticipating a smooth integration with Modernbert.
LLM Agents (Berkeley MOOC) Discord
- No major updates #1: No advanced technical or product developments emerged from the content provided.
- The single mention about a MOOC sign-up date lacked new models, datasets, or key breakthroughs for an AI engineering audience.
- No major updates #2: No additional discussions or relevant references about new benchmarks or tooling were shared.
- Community queries about course logistics do not meet the threshold for in-depth coverage or analysis.
The DSPy Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The Torchtune Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The LAION Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The Mozilla AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The OpenInterpreter Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The HuggingFace Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The Gorilla LLM (Berkeley Function Calling) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
PART 2: Detailed by-Channel summaries and links
The full channel by channel breakdowns have been truncated for email.
If you want the full breakdown, please visit the web version of this email: !
If you enjoyed AInews, please share with a friend! Thanks in advance!