[AINews] not much happened today

track and compare prompts

                January 18, 2025

            [AINews] not much happened today

This is AI News! an MVP of a service that goes thru all AI discords/Twitters/reddits and summarizes what people are talking about, so that you can keep up without the fatigue. Signing up here opts you in to the real thing when we launch it 🔜

            a quiet long weekend is all we need.

AI News for 1/16/2025-1/17/2025. We checked 7 subreddits, 433 Twitters and 34 Discords (225 channels, and 2327 messages) for you. Estimated reading time saved (at 200wpm): 298 minutes. You can now tag @smol_ai for AINews discussions!

o3-mini is coming.

Table of Contents

AI Twitter Recap
AI Reddit Recap
/r/LocalLlama Recap
Other AI Subreddit Recap

AI Discord Recap
PART 1: High level Discord summaries
Stackblitz (Bolt.new) Discord
Eleuther Discord
Cursor IDE Discord
Unsloth AI (Daniel Han) Discord
MCP (Glama) Discord
Interconnects (Nathan Lambert) Discord
Codeium (Windsurf) Discord
OpenRouter (Alex Atallah) Discord
aider (Paul Gauthier) Discord
Nous Research AI Discord
Notebook LM Discord Discord
Perplexity AI Discord
Stability.ai (Stable Diffusion) Discord
Nomic.ai (GPT4All) Discord
GPU MODE Discord
tinygrad (George Hotz) Discord
Yannick Kilcher Discord
OpenAI Discord
LM Studio Discord
Cohere Discord
Modular (Mojo 🔥) Discord
Latent Space Discord
LlamaIndex Discord
DSPy Discord
Axolotl AI Discord
OpenInterpreter Discord
AI21 Labs (Jamba) Discord

PART 2: Detailed by-Channel summaries and links
Stackblitz (Bolt.new) ▷ #announcements (1 messages):
Stackblitz (Bolt.new) ▷ #prompting (6 messages):
Stackblitz (Bolt.new) ▷ #discussions (283 messages🔥🔥):
Eleuther ▷ #general (227 messages🔥🔥):
Eleuther ▷ #research (39 messages🔥):
Eleuther ▷ #scaling-laws (19 messages🔥):
Eleuther ▷ #lm-thunderdome (3 messages):
Cursor IDE ▷ #general (236 messages🔥🔥):
Unsloth AI (Daniel Han) ▷ #general (136 messages🔥🔥):
Unsloth AI (Daniel Han) ▷ #off-topic (1 messages):
Unsloth AI (Daniel Han) ▷ #help (42 messages🔥):
Unsloth AI (Daniel Han) ▷ #showcase (5 messages):
Unsloth AI (Daniel Han) ▷ #research (1 messages):
MCP (Glama) ▷ #general (100 messages🔥🔥):
MCP (Glama) ▷ #showcase (25 messages🔥):
Interconnects (Nathan Lambert) ▷ #news (102 messages🔥🔥):
Interconnects (Nathan Lambert) ▷ #ml-drama (4 messages):
Interconnects (Nathan Lambert) ▷ #random (6 messages):
Interconnects (Nathan Lambert) ▷ #nlp (3 messages):
Interconnects (Nathan Lambert) ▷ #reads (4 messages):
Interconnects (Nathan Lambert) ▷ #lectures-and-projects (1 messages):
Codeium (Windsurf) ▷ #announcements (1 messages):
Codeium (Windsurf) ▷ #discussion (28 messages🔥):
Codeium (Windsurf) ▷ #windsurf (90 messages🔥🔥):
OpenRouter (Alex Atallah) ▷ #general (116 messages🔥🔥):
aider (Paul Gauthier) ▷ #general (45 messages🔥):
aider (Paul Gauthier) ▷ #questions-and-tips (60 messages🔥🔥):
aider (Paul Gauthier) ▷ #links (2 messages):
Nous Research AI ▷ #general (58 messages🔥🔥):
Nous Research AI ▷ #ask-about-llms (17 messages🔥):
Nous Research AI ▷ #research-papers (6 messages):
Nous Research AI ▷ #research-papers (6 messages):
Notebook LM Discord ▷ #use-cases (11 messages🔥):
Notebook LM Discord ▷ #general (76 messages🔥🔥):
Perplexity AI ▷ #general (68 messages🔥🔥):
Perplexity AI ▷ #sharing (5 messages):
Perplexity AI ▷ #pplx-api (5 messages):
Stability.ai (Stable Diffusion) ▷ #general-chat (76 messages🔥🔥):
Nomic.ai (GPT4All) ▷ #announcements (1 messages):
Nomic.ai (GPT4All) ▷ #general (66 messages🔥🔥):
GPU MODE ▷ #general (8 messages🔥):
GPU MODE ▷ #triton (6 messages):
GPU MODE ▷ #torch (29 messages🔥):
GPU MODE ▷ #algorithms (1 messages):
GPU MODE ▷ #off-topic (6 messages):
GPU MODE ▷ #arm (1 messages):
GPU MODE ▷ #liger-kernel (1 messages):
GPU MODE ▷ #thunderkittens (7 messages):
GPU MODE ▷ #arc-agi-2 (4 messages):
tinygrad (George Hotz) ▷ #general (10 messages🔥):
tinygrad (George Hotz) ▷ #learn-tinygrad (39 messages🔥):
Yannick Kilcher ▷ #general (17 messages🔥):
Yannick Kilcher ▷ #paper-discussion (22 messages🔥):
Yannick Kilcher ▷ #ml-news (7 messages):
OpenAI ▷ #ai-discussions (20 messages🔥):
OpenAI ▷ #gpt-4-discussions (7 messages):
OpenAI ▷ #prompt-engineering (6 messages):
OpenAI ▷ #api-discussions (6 messages):
LM Studio ▷ #general (39 messages🔥):
Cohere ▷ #discussions (10 messages🔥):
Cohere ▷ #questions (7 messages):
Cohere ▷ #api-discussions (1 messages):
Cohere ▷ #cmd-r-bot (11 messages🔥):
Cohere ▷ #cohere-toolkit (1 messages):
Modular (Mojo 🔥) ▷ #general (4 messages):
Modular (Mojo 🔥) ▷ #mojo (18 messages🔥):
Modular (Mojo 🔥) ▷ #max (1 messages):
Latent Space ▷ #ai-general-chat (22 messages🔥):
LlamaIndex ▷ #blog (2 messages):
LlamaIndex ▷ #general (18 messages🔥):
DSPy ▷ #general (5 messages):
DSPy ▷ #examples (3 messages):
Axolotl AI ▷ #general (5 messages):
OpenInterpreter ▷ #general (1 messages):
AI21 Labs (Jamba) ▷ #general-chat (1 messages):

AI Twitter Recap

all recaps done by Claude 3.5 Sonnet, best of 4 runs.

AI Model Releases and Evaluations

DeepSeek-V3 Advancement: @DeepLearningAI announced that DeepSeek-V3, featuring a mixture-of-experts architecture with 671 billion parameters, surpasses Llama 3.1 405B and GPT-4o on key benchmarks, especially in coding and math tasks.

GPT-5 Release Announcement: @Yuchenj_UW shared that OpenAI will release GPT-5 on April 27, 2023, generating significant anticipation within the community.

MiniMax-01 Coder Availability: @_akhaliq introduced MiniMax-01 Coder mode in ai-gradio, highlighting its application in building a working chess game within a single shot.

Research Papers and Technical Insights

Scaling Visual Tokenizers: @iScienceLuvr presented findings from Meta's new paper on scaling visual tokenizers, emphasizing that small encoders are optimal and that increasing bottleneck size can enhance reconstruction quality but degrade generation performance.

Inference-Time Scaling for Diffusion Models: @sainingxie discussed Google DeepMind's latest work on inference-time scaling, which improves diffusion model sample quality by enhancing search algorithms and verifiers.

RA-DIT Method for RAG Setup: @TheTuringPost detailed the Retrieval-Augmented Dual Instruction Tuning (RA-DIT) method, which fine-tunes both LLMs and retrievers to enhance response quality in RAG setups.

AI Policy, Regulation, and Security

U.S. AI Export Restrictions: @DeepLearningAI outlined the U.S. proposed export restrictions on advanced AI technology, establishing a three-tier system for access to AI chips and models, with Tier 3 countries like China and Russia being excluded entirely.

AI Chatbot Vulnerabilities: @rez0__ revealed a CSRF and prompt injection vulnerability in AI chatbots, highlighting the security risks associated with front-end integrations.

AGI and Superintelligence Concerns: @danintheory emphasized that superintelligence has not yet been achieved, while @teortaxesTex expressed concerns over R1 being recognized as a weapons-grade model, raising regulatory and national security issues.

Tools, Frameworks, and Development

AI-Gradio Enhancements: @_akhaliq introduced updates to ai-gradio, including NVIDIA NIM compatibility and the cosmos-nemotron-34b model, facilitating easy deployment of AI applications.

LangChain Integrations: @LangChainAI showcased how to build AI agents with persistent memory using LangChain, PostgreSQL, and Claude-3-haiku LLM, supporting both Python and Node.js implementations.

Triton Warp Specialization: @andrew_n_carr explained Triton's Warp specialization, which automatically schedules warp groups to run concurrently, optimizing GPU resource usage for tasks like matrix multiplication.

AI in Industry & Use Cases

Personalized Medicine with Llama Models: @AIatMeta introduced OpenBioLLM-8B and OpenBioLLM-70B, fine-tuned Llama models by Saama, aimed at accelerating clinical trials and personalized medicine.

AI Hedge Fund Development: @virattt described their AI hedge fund, which trades multiple stocks using a system that includes valuation, technical, sentiment, and fundamentals analysts, alongside risk agents and portfolio managers.

AI in Cognitive Behavioral Therapy: @omarsar0 shared insights on AutoCBT, a multi-agent framework for Cognitive Behavioral Therapy, enhancing dialogue quality through dynamic routing and memory mechanisms.

Memes/Humor

Vague AI Hype Critique: @polynoamial expressed frustration with vague AI hype, urging for more specific and transparent discussions within the community.

AI Agents Not Ready for Prime Time: @HamelHusain humorously acknowledged that Devin (the AI SWE) is "not quite ready for prime time yet," while promoting Aider as a free alternative.

AI Reddit Recap
/r/LocalLlama Recap
Theme 1. ElevenLabs' TTS: Factors Behind Outstanding Quality

What is ElevenLabs doing? How is it so good? (Score: 320, Comments: 130): ElevenLabs' text-to-speech (TTS) technology is notably superior compared to local models, raising questions about whether it uses a full Transformer model or a Diffuser. The post speculates on whether the company models human anatomy to enhance model accuracy.
The consensus among commenters is that high-quality data is crucial for achieving superior text-to-speech (TTS) performance, with ElevenLabs leveraging actual audiobook data to outperform competitors. Kokoro TTS is mentioned as an open-source alternative but is noted to fall short in emotional expression compared to ElevenLabs.
Several comments highlight that ElevenLabs' success is attributed to using a relatively small compute setup (32x3090 GPUs) and focusing on high-quality datasets rather than synthetic data. Some speculate that ElevenLabs could be built on Tortoise with proprietary optimizations, emphasizing the importance of finetuning with quality voice samples.
Discussions also touch on the challenges of acquiring high-quality, licensed audiobook datasets due to cost and legal issues, with suggestions that Mozilla could play a role in commissioning professional voice actors for training datasets. The public domain resource LibriVox is noted as a potential source for such data.

Theme 2. OpenWebUI's Canvas: Enhanced Multi-Language Support

OpenWebUI Canvas Implementation -- Coming Soon! (Better Artifacts) (Score: 176, Comments: 34): OpenWebUI is enhancing its Canvas feature by expanding language support beyond HTML, CSS, JavaScript, and SVG to include C#, Python, Java, PHP, Ruby, Bash, Shell, AppleScript, SQL, JSON, XML, YAML, Markdown, and HTML. Additionally, a new feature will allow users to switch between Design view and Code view for web design, with a pull request expected in the coming weeks.
Users suggest expanding OpenWebUI with an addon/extension model to allow more customization, similar to browsers. There's interest in supporting additional technologies like Latex, dot, gnuplot, R, VHDL, and Powershell in future versions.
Several users express enthusiasm for integrating diagramming libraries such as mermaid.js and chart.js, with mermaid already being supported. The impact of mermaid on diagramming has been noted as transformative by some users.
There's a desire for comparing OpenWebUI to tools like GitHub Copilot Edit, and inquiries about how its editing feature works, particularly regarding large file handling. Some users are interested in building on top of OpenWebUI for more complex operations, like OS integration and CoT solutions.

Theme 3. DeepSeek V3 vs Claude 3.5 Sonnet: Analyzing the Practical Edge

Is DeepSeek V3 overhyped? (Score: 116, Comments: 93): The author compares DeepSeek V3 to 3.5 Sonnet, noting that while benchmarks match, DeepSeek V3 lacks the impressive feel and nuanced outputs of Sonnet. They describe DeepSeek V3 as a scaled-up base model with minimal human reinforcement learning, contrasting it with models like OAI and LLaMa.
Cost and Performance: DeepSeek V3 is praised for offering approximately 75% of Sonnet's performance at a fraction of the cost, with users noting significant cost savings during usage. Recoil42 highlights that DeepSeek is cost-efficient enough to be used unmetered for most tasks, making it a preferred choice for routine coding and simple tasks, while Sonnet is reserved for more complex problems.
Model Comparison and Use Cases: DeepSeek V3 is noted for its affordability and versatility, particularly in coding tasks like Java and C, where it excels over Sonnet in some areas. However, Sonnet is considered superior for UI generation and post-training on specific languages like React Python, with Charuru emphasizing Sonnet's unique prompt engineering that enhances its human-like interactions.
Open Source and Accessibility: DeepSeek V3 is celebrated for being open-source and accessible, allowing users to leverage its capabilities without restrictions or moral lectures, unlike some other models. Odd-Environment-7193 appreciates its comprehensive responses and adaptability, making it a valuable tool for full-stack engineers and those seeking a modern, flexible AI model.

Other AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT

Theme 1. OpenAI's Task Management Imperfections: User Frustrations Unveiled

Please, I beg you. Make it stop… (Score: 353, Comments: 74): The post author expresses frustration with AI task automation, specifically with setting reminders for Arsenal football matches and daily world news summaries. Despite attempts to cancel the tasks via ChatGPT, the reminders persist, resulting in excessive notifications and emails.
AI Misalignment is highlighted as a real-world issue, with users expressing frustration over persistent notifications despite cancellation attempts. Levoniust comments on this as a notable example of AI misalignment.
Task Automation Challenges are shared, with Ziscz mentioning difficulty in stopping automations, despite being able to turn off notifications in settings.
Humorous Anecdotes and comments about Arsenal highlight the post's relatability, with several users sharing personal stories or jokes about football matches and notifications.

AI Discord Recap

A summary of Summaries of Summaries by o1-preview-2024-09-12

Theme 1. Major Funding Rounds and Company Milestones

Cursor IDE Raises $105M to Revolutionize Coding: Cursor IDE announced securing $105 million from Thrive Capital, Andreessen Horowitz, and Benchmark, fueling optimism for future updates. The community anticipates significant enhancements in code generation features, faster fixes, and expanded model support due to this influx of funding.
Anysphere Secures $105M to Automate Code: Anysphere locked in $105 million in Series B funding to advance AI-powered coding tools for developers. Aiming to serve millions of programmers, this investment reflects strong confidence in AI-driven developer tools and promises exciting developments in coding automation.
Aider Celebrates 25k GitHub Stars: The Aider AI coding assistant surpassed 25,000 stars on GitHub, marking a significant milestone. Community members praised its success as a standout tool in collaborative coding, recognizing its impact on developer productivity.

Theme 2. Advances in AI Model Development and Performance

NanoGPT Speedrun Trains Models in Under 3 Minutes: A new NanoGPT speedrun achieved training completion in under 3 minutes on an 8xH100 cluster, costing about $0.40 per attempt. This showcases drastic improvements in training efficiency with modded-nanogpt code, highlighting progress in AI model optimization.
Google Unveils TITANS for Enhanced Memory: Google Research introduced TITANS, a model architecture using dynamic sub-models to approximate memory-like functionality. While it improves long-sequence handling in transformers, continuous learning remains a work in progress, fueling discussions on future advancements.
MiniMax-01 Unifies Attention Mechanisms: The MiniMax-01 paper presents a model that unifies MHA and GQA to handle longer contexts efficiently. Community members praised the approachable math and open code release, noting its potential impact on processing extended sequences in AI models.

Theme 3. AI Tools and Integrations Enhancing Developer Workflows

TraycerAI Automates Codebase Tasks in Cursor AI: The TraycerAI extension impressed users by tracking entire codebases within Cursor AI, automating tasks and generating implementation plans. Developers appreciated the enhanced workflow and efficiency, highlighting the tool's capability to streamline complex coding projects.
Windsurf Wave 2 Surfs in with Web Search and Memories: Codeium released Windsurf Wave 2, introducing web search capabilities and autogenerated memories to Cascade. This update allows users to incorporate live web context into conversations and maintain continuity across sessions, significantly improving the user experience.
MCP Marketplace Simplifies Servlet Installation: Sage launched an MCP Marketplace that enables one-click installation of MCP servlets on iPad, iPhone, and Mac. Community members praised this frictionless deployment approach, noting it as a hopeful leap forward in cross-platform accessibility and developer convenience.

Theme 4. Challenges and Issues in AI Model Usage and Implementation

Bolt and Cursor IDE Users Report Frustrations: Users expressed significant frustration with Bolt, noting issues like erroneous code deletions and inflated token usage, leading to a need for better prompt practices. Similarly, Cursor IDE users faced long wait times with Claude integration, undermining real-time usability and prompting some to consider alternative solutions.
Perplexity Pro Model Settings Cause Confusion: Perplexity Pro users encountered problems where certain models were not recognized, even after troubleshooting. The community shared concerns over decreased response quality and inconsistencies in model performance, seeking improvements for a more reliable experience.
OpenRouter Activity Page Sparks Confusion: Users raised concerns about the activity page in OpenRouter, reporting that usage graphs appeared identical across different keys. They suspected a bug and emphasized the need for better usage metrics per key, fueling discussions about potential misrepresentations of data.

Theme 5. Community Initiatives and Events in AI

Women in AI Rally for RAG Hackathon: Organizers invited women technologists to the Women in AI RAG Hackathon in Palo Alto, focusing on Retrieval-Augmented Generation with the open-source vector database Zilliz. The event aims to foster networking and mentorship among women in AI, highlighting collaborative growth in the field.
Agent Recipes Offers Code Templates for AI Agents: A new site, Agent Recipes, provides code templates for agent workflows that developers can easily integrate into their AI applications. Early users praised the convenience and speed of implementing agent-based solutions using the provided snippets.
New Book on Foundations of Large Language Models Released: A comprehensive book covering the fundamentals of large language models was shared, focusing on pre-training, generative architectures, prompting approaches, and alignment methods. Targeted at students and practitioners, it offers a thorough grounding in modern language model development.

PART 1: High level Discord summaries

Stackblitz (Bolt.new) Discord

Lucide Librarian Saves Bolt: StackBlitz resolved the Lucide icon not found error, letting Bolt’s agent tap entire icon libraries, as documented in a StackBlitz tweet.
They introduced deterministic icons that cut down guesswork, and the community praised the live fix requiring no extra tokens or debugging.

Prompting Powers: React & NPM: Members discovered that instructing AI to add NPM packages in React code improved functionality by preventing partial code edits or ‘subtractions.’
They also recommended clarifying when the AI should expand existing sections, preserving focus instead of rewriting elements.

TraycerAI & File Docs Synergy: Community feedback praised the TraycerAI extension that tracks entire codebases in Cursor AI, automating tasks and generating implementation plans.
Some also proposed an instructions folder with thorough file structure docs for a PDF annotation web app, but they occasionally caught the AI producing imaginary details.

Bolt’s Bugs & Git Gains: Frustrations ran high as users reported erroneous code deletions in Bolt, inflated token usage, and a need for better prompt practices.
A planned Git integration will let folks clone repositories directly into Bolt, potentially reducing these issues and streamlining project management.

Supa-Snags & Domain Dreams: Connectors to Supabase caused invalid UUID errors, prompting suggestions for logging inputs to pinpoint the mismatch.
A user concurrently worked on a domain crawler to identify expiring domains, envisioning potential profits for those interested in snagging valuable URLs.

Eleuther Discord

RWKV Races Through Testing: Amid checks on BlinkDL's RWKV Gradio, the RWKV 0.4B model showed strong results but struggled with the box puzzle perplexities.
Community chatter suggested more training tweaks, like CoT methods, might address these tricky tasks and push RWKV's performance further.

NanoGPT Speedrun Spurs Cheaper Training: A new NanoGPT speedrun set a record of finishing in under 3 minutes on an 8xH100 cluster, costing roughly $0.40 per attempt.
The tweet by leloykun impressed onlookers with further code refinements in modded-nanogpt that drastically shrank compute time.

QRWKV Project Aims for Linear Prefill: The QRWKV effort converts transformer models for more efficient prefix handling, highlighted in the Q-RWKV-6 32B Instruct Preview.
Enthusiasts mentioned upcoming QRWKV7 approaches, hoping to see consistent gains across multiple benchmarks.

Gradient Gusto with Compression: Engineers discussed Deep Gradient Compression techniques to trim bandwidth usage in distributed SGD, referencing this paper.
Enthusiasts see potential for larger-scale training as these compression ideas get integrated, though adoption in mainstream setups remains limited.

Context Warmup Sparks Growth: A flexible sliding window approach extends context lengths up to ~1856 tokens, letting trainers ramp capacity without losing data order.
Proponents say this approach reduces training headaches and ensures better text continuity, fueling more robust model outputs.

Cursor IDE Discord

Cursor Gains $105M Lift: Cursor announced raising $105 million from Thrive, Andreessen Horowitz, and Benchmark, highlighting their growing presence in developer tooling. This tweet confirmed the funding, generating big optimism for future updates.
The community sees this backing as a shot in the arm for code generation features, with early hints of expanded model support. They anticipate faster fixes and more robust features in upcoming releases.

Claude Slows the Code Flow: Developers hit wait times of up to 10 minutes with Cursor IDE's Claude integration, undermining real-time usability. Some considered using local solutions or alternative integrations to avoid lags.
Discussions centered on how to reduce overhead and whether Anthropic's status might be a factor. Others debated if offsetting the overhead with local caching could help the workflow.

O1 Model Shines in Complex Tasks: The O1 model boosted coding workflows and streamlined advanced problem-solving, prompting interest in personal API key usage. Various testers reported fewer misinterpretations when tackling larger codebases.
Community members questioned the cost structure for those who prefer direct O1 access via Cursor. They advocated for transparent integration pathways and pointed to possible synergy with agent-based tasks.

UI Hiccups Spark Workarounds: Overlapping code suggestions and paste issues hampered usability for some users, with Ctrl+Shift+V as a partial fix. They complained about the inconvenience of toggling between chat and composer modes.
Several suggested adding an alert system when generating completions to reduce confusion. Others recommended a dedicated panel for code suggestions to prevent text-blocking overlays.

Agent vs Normal Mode Enhances Terminal Access: A forum post highlighted differences in modes, with agent mode enabling terminal commands. Some questioned potential security implications but praised the expanded control.
Feedback indicated the feature sets a foundation for more dynamic coding sessions. Despite some reservations, users welcomed the increased flexibility and pointed to agent-based flows for advanced automation.

Unsloth AI (Daniel Han) Discord

Qwen 2.5 Quick-Think Tactic: The new Qwen 2.5 model uses a two-stage process—first thinking, then generating—to refine context before producing answers.
It sometimes produces unintended or excessively long outputs, prompting calls for further tuning to rein in runaway responses.

Llama-3.2 Steps Up: Codelion’s Llama-3.2 packs 3.21B parameters, finetuned with Unsloth for faster training speed and decent performance gains.
It has gained 139 downloads in a month, yet some users expect to scale up to bigger models (e.g., 70B) for more nuanced results.

LoRa Speed Race Sparks Chat: Users compared LoRa adapters trained with Unsloth and Hugging Face, highlighting 2x faster training on Unsloth but similar inference speeds.
They shared experiences of piping in fewer dependency conflicts and shorter training cycles, fueling curiosity about performance optimization.

Prompt Trackers in Action: The community requested packages or tools to track and compare prompts across multiple open-source LLMs, reinforcing the push for consistent testing.
They hope for simplified frameworks that help maintain alignment in model outputs while measuring performance across different tasks.

KD Full Fine-Tuning Meets LORA: A brief exchange touched on whether knowledge distillation (KD) can incorporate selective weights similarly to LORA approaches.
Members weighed potential overlaps in method design, sparking interest in new tricks for model performance improvements.

MCP (Glama) Discord

Sage's Shiny MCP Marketplace: Sage recently won the MCP Run hackathon, showcasing a new MCP Marketplace that allows for one-click installation of MCP servlets on iPad, iPhone, and Mac.
They pitched it as a frictionless approach to deployment, prompting members to call it a hopeful leap forward in cross-platform accessibility.

MCP-Bridge Baffles Beginners: A user tried pairing MCP-Bridge with AnythingLLM but got stuck, requesting examples and best practices from MCP-Bridge docs.
Others suggested joining the MCP-Bridge Discord for deeper support, sharing that it extends standard OpenAI endpoints to orchestrate multiple servlets.

Integration & Testing MCP SDK Gains Steam: Members sought unit tests for the official Python SDK against an actual MCP server, referencing subprocess testing approaches.
They debated the reliability of integration tests with external dependencies but agreed robust coverage ensures fewer regressions in MCP workflows.

User Simulation Tricks Amuse Devs: One member revealed a cunning approach to mocking Discord interaction, highlighting a specialized system prompt that imitates user messages nearly flawlessly.
After they explained the ironically contrived nature of these simulation attempts, they concluded 'my point proven' about scripted user input.

frgmt0's Alpha Code Launch: The developer revealed a new GitHub project in alpha stage, inviting feedback from peers on architecture and performance.
They welcomed bug reports and suggestions to shape the codebase, seeking a collaborative process for eventual production readiness.

Interconnects (Nathan Lambert) Discord

SWE-bench & WeirdML Wow Factor: SWE-bench Multimodal code burst onto the scene, focusing on JavaScript glitches like map rendering and button text, as seen in this update.
Meanwhile, WeirdML unveiled a fresh benchmark of offbeat tasks in PyTorch, prompting discussions on the growing flexibility of large language models.

OpenAI’s Cryptic Teasers Criticized: Community members bemoaned OpenAI’s vague announcements, urging more transparency on timelines and capabilities.
They stressed that direct and concrete updates are crucial for trust in AI progress.

Deepseek R1 Rumors & Rivalry: Speculation swirls around Deepseek R1 potentially matching o1-Medium for code reasoning, creating buzz over a new competitor.
Observers anticipate a leaderboard shake-up if the rumored release meets these performance claims.

NeurIPS PC Drama & Transparency Tussle: Critics labeled the NeurIPS committee a 'clown show' for prioritizing hype over rigorous vetting, per Andreas Kirsch's critique.
Protesters argued that poor communication and weak oversight undermine research standards, mirroring broader outcries about secrecy in AI.

Devin AI Bags $21M for Autonomous Coding: Devin secured a $21 million Series A in March 2024, backed by Founders Fund and other key investors, claiming it can handle coding tasks with minimal human input.
Early demos reported by Answer.AI show Devin completing PyTorch issues at a 13.86% success rate, sparking chatter on future 'AI freelancer' possibilities.

Codeium (Windsurf) Discord

Windsurf Wave 2 Gains Momentum: The official launch of Windsurf Wave 2 introduced major upgrades like performance boosts and Dev Container fixes, as noted in the Codeium blog.
Everything from system reliability to user workflows saw refinements, with live updates posted on the Codeium status page.

Cascade Surfs the Web & Generates Memories: With the new release, Cascade can now search the web automatically or via URL input, supported by autogenerated memories that maintain continuous context.
Users praised the streamlined approach for referencing links in real time, calling it a strong quality-of-life boost.

Students Face Discount & Refund Tangles: Some .edu holders were unexpectedly charged the $10 rate instead of $6.90, while a frustrated user demanded a $297 refund with minimal resolution.
Codeium acknowledged the discount confusion and promised expansions beyond the US, but older .edu domains still triggered issues.

Tool Integration Ideas Make Waves: Community members suggested hooking up external crawlers like crawl AI and user-provided APIs to broaden Windsurf capabilities.
They also floated adding these commands into system prompts, hoping for more flexible usage scenarios.

Bugs, Logins, and IDE Feedback: Reports highlighted autocomplete failures, infinite loops, and login snags on Linux, with recommendations to submit logs for quick fixes.
Others pointed to references like the Open VSX Registry and raised calls for official support tickets.

OpenRouter (Alex Atallah) Discord

Activity Page Chaos: Bug or Feature?: Users raised confusion about the activity page in OpenRouter, complaining that the usage graph appears identical across different keys, prompting concerns about a bug.
They insisted on better usage metrics per key, fueling speculation that the design might be misrepresenting data.

Gemini 2.0 Flash Disrupts Endpoint: The Gemini 2.0 flash model introduced a new endpoint, causing request errors in OpenRouter integrations.
Members verified that website documentation needed an update to align with these changes, which briefly broke existing setups.

Hong Kong Requests Hit a Block: Multiple users reported OpenRouter requests failing in Hong Kong while working when routed through Singapore, implying a new relay requirement.
They recalled that OpenAI and Anthropic historically limit certain regions, which might explain the intermittent blockade.

DeepSeek V3 Sparks Mixed Opinions: Community chatter focused on DeepSeek V3 from the DeepSeek team, highlighting uncertain performance across varied tasks and usage.
Some recommended tinkering with configuration for improved output, sparking a debate on consistent reliability across complex scenarios.

BYOK Setup Needs Clearer Signals: Users praised the Bring Your Own Key feature but requested explicit confirmations when keys are integrated into OpenRouter.
They also suggested adding extra metadata in requests to confirm if the correct key is active, potentially reducing guesswork for advanced use cases.

aider (Paul Gauthier) Discord

DeepSeek 3 tangles with context & quantization: One user faced repeated errors using the DeepSeek3 model with 16k context from OpenRouter, and ignoring that provider was suggested as a fix.
Others debated performance differences between Q4 or Q5 quantization, expressing skepticism about overly reducing precision for DeepSeek3.

Aider celebrates 25k GitHub stars: The Aider community applauded surpassing 25k stars on GitHub, signaling a major milestone for the AI coding assistant.
Members praised its success and recognized its position as a standout tool in collaborative coding.

CodeGate secures local dev secrets: Developers showcased CodeGate for protecting private data in AI-assisted code, pointing to CodeGate's repo and YouTube demos and (https://www.youtube.com/watch?v=lH0o7korRPg).
They emphasized CodeGate’s encryption layer to thwart accidental leaks, boosting trust for AI-driven coding.

Agentic tools power code exploration: Participants examined Aide.dev, Cursor, and custom CLI solutions for exploring codebases, referencing Cursor's forum thread.
They combined refined RAG tactics with strategies for context-heavy tasks, highlighting local prompt management to improve results.

Helicone monitors LLM usage & costs: The Helicone repository presented an open-source LLM observability suite offering cost analysis, security layers, and rate limiting via Docker or cloud.
Some noted synergy with Activepieces for robust multi-LLM usage metrics, showcasing varied integration approaches.

Nous Research AI Discord

Nous Nabs $400M Windfall: Members confirmed Nous Research secured a whopping $400 million in funding, fueling debate on its potential growth and how it might challenge other AI labs.
Some mentioned hosting their models on OpenRouter, while others noted widespread interest in premium GPU services.

OpenAI's Peculiar Pay Path: Talks focused on profit participation units (PPUs) at OpenAI, referencing complex equity schemes that differ from standard stock options, outlined in this overview.
Several members cited the subsequent tender offers allowing employees to cash out, spotlighting how these share structures might shape real-world payouts.

GPT-2 RAG Bot Breaks Down: One user complained about GPT-2 failing to handle PDF-based retrieval, often returning bland or repetitive responses.
Contributors recommended switching to newer compact models like smollm and Qwen, remarking that structured output remains tricky when dealing with large source documents.

Titans and the Memory Makeover: Developers praised Titans: Learning to Memorize at Test Time for its approach to referencing historical context without sacrificing parallel training speed.
The PyTorch version by lucidrains garnered attention for its potential to reduce memory overhead in transformer models.

Introductory LLM Book Gains Steam: A new text on large language models, found here, covers four main pillars—pre-training, generative architectures, prompting approaches, and alignment methods.
The book targets both students and practitioners who want a thorough grounding in the fundamentals of modern language model development.

Notebook LM Discord Discord

Virtual Travel Agent Bot Takes Off: One user successfully hosted a workshop on a virtual travel agent for Zambian trips, pointing to this official NotebookLM outline.
Attendees noted that the bot effectively recommended lodging and tours, though some believed NotebookLM could use enhancements for faster results.

AI Studio Edges Out NotebookLM: A participant argued that AI Studio is more dependable than NotebookLM, praising its greater accuracy for varied tasks.
They expressed skepticism about NotebookLM’s ability to form in-depth connections, advocating for AI Studio in complex scenarios.

Perplexity AI Discord

Sonar Surfaces in Labs: Engineers spotted Sonar and Sonar-Pro models in labs, fueling speculation about upcoming changes to the Perplexity API. The official model cards outline potential enhancements in text generation and custom stop parameters.
Users questioned whether these developments hint at more model variations on the horizon, referencing CrewAI reports about persistent custom stop errors across multiple model trials.

OpenAI's Economic Blueprint: A shared link revealed OpenAI's economic blueprint, describing new strategies for sustainable revenue and industry positioning. Observers highlighted cost management approaches that could prompt broad updates across the landscape.
Members expressed interest in this roadmap’s ripple effects, with some calling it a bold step toward less reliance on established platforms.

Starship 7's Surprising Slip: Several users discussed Starship 7 losing flight stability, citing early analyses found here. Investigators are exploring possible structural or propulsion glitches as the main culprits.
Community members considered atmospheric factors and launch timing, illustrating how variable flight conditions can affect large-scale aerospace projects.

China's Orbiting Solar Ambitions: A posted video showcased China's plan for building a giant orbiting solar array, available in this YouTube overview. Observers anticipate fresh energy trials that might broaden global power capabilities.
Enthusiasts contrasted this approach with standard satellite-based grids, suggesting that national-level projects could advance space-based energy solutions more quickly.

Apple's First USA-Made iPhone Chips: Apple confirmed intentions to produce iPhone chips in the US for the first time, signaling a shift in domestic manufacturing efforts. Observers noted that this move can reshape supply chains and prompt cost reevaluations.
Community members viewed it as a strategic pivot for Apple, influenced by global manufacturing trends and the company's long-term hardware plans.

Stability.ai (Stable Diffusion) Discord

Lynch’s Lodge Lights Laughs: Members joked about David Lynch appearing in the Lodge with dark humor, referencing the unpredictable moral dimension in his art.
The quirky remarks showed the community’s comedic side, with one comment calling it a 'blend of fear and fascination' inspired by Lynch’s style.

Stable Diffusion Gains Business Traction: Multiple discussions tackled commercial usage scenarios for Stable Diffusion, emphasizing print-on-demand images that require upscaling.
Participants debated licensing nuances but affirmed that user outputs are typically allowed unless restricted by the model itself.

ControlNet Confusion Baffles Creators: Users struggled integrating ControlNet with reference images, discovering that a prompt is still essential for image-to-image tasks.
Suggestions included adopting lineart or alternative approaches, stressing the various ways to extract data for more consistent outputs.

LoRA Lessons from Personal Photo Training: A user faced issues training a LoRA model with their child’s photos, questioning how best to crop images and handle resolution limits.
Members recommended careful dataset preparation and possible architecture adjustments for improved training results.

Switching WebUIs Sparks Cartoonish Chaos: One user moved from SD Forge to Automatic1111 and dealt with comical outputs traced to a Hugging Face model mismatch.
They mentioned this GitHub repo for managing prompts in styles.csv, underscoring how consistent settings can prevent unexpected results.

Nomic.ai (GPT4All) Discord

Nomic Goes Open with Apache 2.0: Nomic Embed Vision is now under an Apache 2.0 License, reportedly outclassing OpenAI CLIP and text-embedding-3-small in multiple benchmarks.
They also released open weights and code, enabling flexible image, text, and multimodal integrations for developers.

Models Race on Limited VRAM: Members compared LocalLlama and DavidAU's for better performance on 8GB setups, exploring quantization tricks.
They noted varied results across rigs, ranging from smoother throughput to random slowdowns, sparking interest in further speedups.

Custom URL Schemes Tame Workflow: A user tested linking to Emacs with a custom hyperscope:// protocol for direct file access, discussing embedding .md or .html files.
Others joined in, highlighting that automatic program launches streamline specialized knowledge retrieval and reduce overhead.

Template Woes in Qwen2.5-1.5B Land: Parsing errors plagued certain Qwen2.5-1.5B prompts while using ChatML style templates, forcing tweaks to LocalDocs instructions.
One user’s frustration grew when shifting to older GPUs like Quadro NVS300, as minimal VRAM proved too restrictive for advanced models.

GPU MODE Discord

LeetGPU Delivers a Free CUDA Playground: The brand-new LeetGPU offers a free, sign-up-free environment for CUDA experimentation on the web, recommended alongside CUDA by Example for a quick start.
Community members indicated that while the book is older, it thoroughly covers GPU fundamentals, supplemented by references in the official docs.

Triton Tactics with Warp Specialization: Developers boosted stage1_v2 performance by adjusting buffer sizes, achieving faster DRAM access and showcasing the Automatic Warp Specialization Optimization.
They discussed barriers for data-flow-based kernel fusion and celebrated warp specialization merging into the main Triton repository.

Torch Twists Double Backward: One user faced a memory corruption bug with libkineto in the Torch profiler, while another explored a custom autograd.Function for addbmm and Softplus activation with double backward.
They noted torch.compile() currently lacks double backward support, leading to ideas on managing intermediate tensors and reducing redundant backward passes.

Arm64 Runners & Copilot's Error Explanation: The team unveiled Linux arm64 hosted runners for free in public repositories, as announced in the GitHub changelog.
They also introduced Copilot's 'Explain Error' feature to offer instant insights into Actions job failures, streamlining real-time debugging.

Thunderkittens Targets Ampere GPUs: Members emphasized tensor cores in development, suggesting Ampere-based cards like A100, H100, or 4090 for maximum effectiveness.
They mentioned LeetGPU for those without dedicated hardware and referenced an Apple-based port for M chip compatibility.

tinygrad (George Hotz) Discord

Feisty Flash Attention Fiasco: Efforts to embed Flash Attention in Tinygrad took eight hours, ultimately hitting GPU OOM and memory issues despite attempts to map nested loops into tensor dimensions. A small victory surfaced when one partial step of stable diffusion ran on 25GB of GPU RAM, offering a hint of hope.
Participants noted frustration with explicit loops required for Flash Attention, questioning whether Tinygrad can adapt effectively without rethinking its memory controls.

Operator (Un)Fusion Freedoms: A GitHub tutorial on operator (un)fusion shared insights on combining ops in Tinygrad to reduce overhead. This resource spotlights dimension handling intricacies, outlining ways to streamline scheduling.
Members discussed the trade-offs of single-kernel approaches in balancing performance with memory constraints, maintaining that proper chunking avoids runtime slowdowns.

Jittery JIT Adjustments: Contributors explored handling variable batch sizes while preserving JIT throughput, advising .realize() calls for controlling computational graphs. Some considered padding techniques to keep inputs consistent.
They debated splitting JIT mechanisms for training vs testing, highlighting that toggling optimizations could risk performance inconsistencies.

FP8 Forays in Tinygrad: Support for FP8 arose from calls to add a feature flag, ensuring minimal impact on existing tests. Developers planned to isolate fragile code paths and incrementally integrate this new precision option.
They aimed to preserve backward compatibility while dipping into advanced numeric experimentation, emphasizing a careful line-by-line approach to avoid breakages.

Windows Woes, Then Wins: Community members questioned Windows support after references suggested dropping it, yet developers indicated it mostly works except for mmap constants. They shared that certain fixes enable tests to run, revealing it's not fully abandoned.
Enthusiasts embraced these insights to keep Windows viability afloat, mindful that platform-specific quirks still demand targeted patches.

Yannick Kilcher Discord

FORTRAN Reignited, CUDA Critiqued, Triton Emerges: In a surprising turn, FORTRAN spurred chatter about maintaining older languages in fresh HPC contexts.
Members voiced frustration with CUDA's complexity and praised Triton for its Python base, even though some noted 'ChatGPT isn't as good at it.'

Complex Loss Functions & V JEPA Tensions: Participants explored complex loss functions for advanced AI metrics, sharing intrigue about the most demanding designs encountered.
They also revisited the V JEPA paper, debating how its attentive layer and softmax might affect embeddings in downstream tasks.

MiniMax-01 Paper & 3090 Training Triumphs: Attendees dissected the MiniMax-01 paper, which unifies MHA and GQA to handle longer contexts.
One user trained a 100M-parameter flow matching model on a 3090 TI, praising the approachable math and simplified code release.

Active Inference & Non-Verbal Cues Up Front: A YouTube video featuring Karl Friston stirred discussion on active inference, covering free energy and time aspects.
Members highlighted how non-verbal communication might account for up to 60% of total interactions, underscoring facial expressions and gestures.

Memory Mods & CaPa's 4K Mesh Method: Enthusiasts debated 3090 memory mods, wondering about GPU upgrade prospects.
They also spotlighted the CaPa approach for rapid 4K mesh outputs, prompting comparisons to Trellis.

OpenAI Discord

TITANS Tussle with Two-Model Memory: Google Research introduced a new model called 'TITANS' that uses two smaller dynamic sub-models to approximate memory-like functionality, potentially enhancing longer sequence handling.
Members pointed out it still lacks continuous learning, signaling that it’s not yet a complete solution for adaptive recall.

RunwayML’s 'Underwear Drawer' Dilemma: A quirky reference to an underwear drawer triggered RunwayML content moderation, raising questions about oversensitive filters.
Others noted the ironic specifics of these rules, as seemingly benign phrases can send tools into unexpected alert mode.

Master AI Agent Targets LLM Logs: A user proposed building a master AI agent to examine large conversation archives from multiple LLMs and yield targeted sub-agents.
They asked for shared experiences, citing challenges in consolidating huge data streams from different language models.

Mind Journal Mishap & Date Defects: Rechecking the DALL·E box in the GPT Editor remedied the Mind Journal issues, which had caused confusion about normal functionality.
Users also reported INVALID DATE placeholders in version history, complicating reliable change tracking.

Prompt Engineering Plans and Jailbreak Jitters: A member aimed to write a prompt engineering book in 30 days, referencing official OpenAI documentation for structured learning.
Meanwhile, the community cautioned against explicit jailbreak talk, emphasizing strict moderation standards and the risks of border-pushing topics.

LM Studio Discord

Molmo Vision Model Fumbles with trust_remote_code: Encountering errors with the Molmo vision model forced users to enable trust_remote_code=True, but LM Studio doesn't allow that approach.
A member confirmed that MLX models needing this setting won't function on LM Studio, leaving a gap in vision support.

Llama 3.2 Vision Out of Bounds: Users faced unknown architecture errors running Llama 3.2 vision, confirming it only works on Mac MLX builds.
Incompatibility for Windows/Linux LM Studio fueled confusion as the model remains locked to Mac usage.

Mac Chokes on Phi-4’s Slow Token Rates: Members with 16GB Mac RAM saw as low as 0.05 tokens/sec generating text with Phi-4 in LM Studio.
They noticed a sluggish start but observed improved speeds after a few tokens, suggesting resource constraints hamper initial performance.

MiniMax-01 Underwhelms: Comparisons with WizardLM-2 revealed unimpressive results from MiniMax-01, especially in formatting and Chinese output tasks.
A user considered it a mediocre choice, citing minimal improvements over established competitor models.

Vision Models Stuck on First Image: One user noticed that new images in vision models still reference the first image unless the chat is reset.
They recommended clearing or reloading the session, remarking it's a recurring glitch across multiple vision-based models.

Cohere Discord

Engaging Intros & Student AI Projects: One user urged more robust introductions for newcomers, encouraging them to share more than a simple greeting to foster lively exchanges. Another user discussed a final-year project in Generative AI, citing the potential for deeper community involvement and brainstorming.
They suggested that sharing goals or issues early on can spark technical collaboration, with the community ready to offer focused insights and constructive feedback.

Reranking Chat History & Relevancy Gains: A member asked about structuring conversation logs in the rerank prompt in correct chronological order while providing enough context. Another emphasized that more details improve semantic alignment, especially when indexing results precisely for better retrieval.
They also discussed capturing older messages to strengthen references, describing “the more data the model sees, the sharper its recommendations” as a guiding principle for reranker usage.

Command R Model Costs & 8-2024 Confusion: Members questioned whether 8-2024 versions of command-r share the same pricing as previous editions, noting uncertainty about any cost changes. Others observed that the default command-r still points to an older timestamp, leaving room for speculation about version naming and potential new features.
Users mentioned a few oddities with the 8-2024 deployment and advised close monitoring of performance as real-world feedback could reveal unexpected quirks.

Cohere's Free Deep Learning Pathways: Cohere spotlighted LLM University and Cookbooks, which provide hands-on 'Hello World!' tutorials and $75 in credits for the first three months. These resources let newcomers quickly experiment with Language AI for various tasks.
They also highlighted AWS Cloud integration that enables a managed environment, removing hefty infrastructure needs while supporting advanced deployments.

Modular (Mojo 🔥) Discord

Modular’s Magical Migration: All public GitHub repos migrated from ModularML to Modular, with auto redirects in place, enabling easy navigation.
Members also proposed adding Mojo and MAX projects to awesome-for-beginners, broadening exposure for novices.

Mojo’s Parallel Quandary: A user flagged an issue using parallelize in Mojo with Python code, which fails if num_work_items and num_workers both exceed 1, while purely Mojo code works fine.
They noted that it occurs specifically within the start function of a structure connecting to the Foo class, suggesting further debugging might be needed.

Variant as a Sum Type Supremacy: Engineers considered Variant in Mojo as a stand-in for sum type support, but remain cautious due to continuing language changes.
They also discussed possible library rework, recommending an incremental approach until the standard library stabilizes.

MAX & .NET: A Composable Contemplation: Members speculated that MAX’s final form could mirror .NET as a suite of composable components, possibly using Mojo or C# as the core language.
Their conversation underlined the importance of composability, referencing synergy between frameworks for cross-platform expansions.

JSON & Quantum Thanks: One user praised yyjson for efficient handling of large JSON data, highlighting immutable and mutable structures in yyjson docs.
They also thanked the community for pointing them to quantum.country, calling it a fantastic training ground for quantum concepts.

Latent Space Discord

SWEBench Surges with O1 Agent: Our CTO announced their o1-based AI programming agent scored 64.6% on SWEBench, marking a performance milestone, as shown in this tweet. They are preparing a formal submission for verification, highlighting key insights gained in o1-driven development.
This is said to be the first fully o1-driven agent known, sparking plans for new benchmarking attempts. Some community members anticipate extended testing scenarios to validate these impressive scores.

Anysphere Lands $105M to Automate Code: Anysphere locked in $105 million in Series B funding to advance AI-powered coding, detailed in Cursor’s blog. Their supporters include Thrive Capital and Andreessen Horowitz, focusing on an editor that serves millions of programmers.
Excitement arose over potential upgrades to coding automation and deeper R&D breakthroughs. Some attendees mentioned parallels to separate law-oriented AI funding, but official data remains limited.

Agent Recipes Rolls Out: A site dubbed Agent Recipes emerged with code templates for agent workflows, outlined in this tweet. It promises easy integration into AI applications through copy-and-paste examples.
Early users praised the speed at which they could spin up agent-based solutions using the provided snippets. The community sees it as a convenient route to incorporate agent behavior.

Biden Issues Cybersecurity Order: President Joe Biden enacted a major cybersecurity executive order, described in this Wired article, aimed at boosting AI security and identity measures. The plan addresses foreign cyber threats and sets guidelines for U.S. agencies.
Some engineers expect these rules to reshape government procurement decisions for AI vendors. Others foresee challenges syncing these mandates with large-scale workflows.

Concerns About OpenAI’s webRTC API: Developers voiced frustrations implementing OpenAI's webRTC realtime API, given few examples beyond internal demos. Many requested open-source references or a knowledge base for real-time streaming setups.
They noted complexities balancing data throughput and overhead. The discussion ended with a push to gather community-driven solutions and docs.

LlamaIndex Discord

Women in AI Call for RAG: Organizers invited women technologists to the Women in AI RAG Hackathon in Palo Alto, featuring Retrieval-Augmented Generation with the open-source vector database Zilliz.
Attendees will network with fellow professionals and mentors in an all-day event that spotlights robust RAG approaches.

GraphRAG Shares the Spotlight: A recent webinar highlighted how Memgraph and LlamaIndex join forces to create graph-based agentic applications, focusing on GraphRAG for better context retrieval Watch here.
Presenters stressed agentic strategies and tips to improve RAG pipelines, expanding how developers incorporate contextual data More here.

CAG Concept Spurs Innovation: Members discussed Cached Augmented Generation (CAG) with Gemini and LlamaIndex, revealing that it usually demands direct model access, such as PyTorch.
They shared a CAG implementation demonstrating a powerful caching technique for faster generation.

Azure Integration Sparks Confusion: A user struggled with Azure AI routing calls to OpenAI, pointing to an incomplete configuration of the service.
Suggestions included setting up a dedicated embedding model while also calling for better example pages to clarify model selection.

Metadata and Prompt Tracking Under Scrutiny: Participants clarified that node metadata can be toggled via excluded_llm_metadata_keys and excluded_embed_metadata_keys for chunking and embedding tasks.
They also sought a package to track and compare prompts across open-source LLMs, though no specific solutions emerged.

DSPy Discord

DSPy V3 Slips Q1 Targets: The dev team confirmed that DSPy v3 won't launch in Q1 due to major internal changes, keeping the release date up in the air.
They cited ongoing discussions on readiness, hinting that smaller updates may arrive before this bigger version.

Stable Diffusion Gains Momentum with Chain-of-Thought: A new venture aims to refine Stable Diffusion prompts via a 'chain-of-thought' approach, as shown in Thorondor LLC's tweet.
Community members expressed excitement about leveraging DSPy for iterative prompt building, focusing on step-by-step enhancement of text embeddings.

ReAct Ruckus Over Addition Tool: A user encountered an error with dspy ReAct where the addition tool wouldn't sum two numbers, citing unknown required arguments.
They ran LLama under LM-Studio and suspected a redefinition conflict, with full error logs requested to pinpoint the cause.

Axolotl AI Discord

ChatML Sizing Up Llama3: Members debated the advantages of ChatML versus Llama3, hinting at a contest for model supremacy.
One participant gave a casual response of 'duh', underscoring their confidence in ChatML.

ShareGPT Dataset Gains a Thumbs-Up: A question arose about possible complications using ShareGPT, but participants confirmed none exist.
They pointed out a ready-made configuration for key mapping, signaling direct usage without issues.

Migration from ShareGPT Marches Forward: A conversation highlighted a documented path for migrating away from ShareGPT, ensuring smooth transitions.
Users mentioned that this reference covers every step, addressing frequent dataset concerns.

Torchtune Tinkering Grows: A participant noted that Torchtune calls for significant modifications at present.
This requirement suggests deeper code tweaks for anyone depending on the tool’s functionality.

OpenInterpreter Discord

Silent Screenshot Shocker: A user shared a screenshot related to OpenInterpreter but provided no context or commentary, leaving others unsure how to respond.
No one followed up or asked questions, indicating minimal interest or clarity about the screenshot content.

Missed Chance For Visual Insight: Members did not analyze the shared image, suggesting an untapped conversation regarding potential features or issues with OpenInterpreter.
The prompt remained unanswered, revealing the group’s desire for more substance or details before contributing further.

AI21 Labs (Jamba) Discord

Feature FOMO & Curiosity Quests: A user asked about how a feature was found, wondering if they'd known of it previously or if it was newly explored.
This spurred interest in how engagement patterns can reveal untried functionality and overlooked potential.

Testing Tangles & Missed Opportunities: Another user highlighted roadblocks in trying seldom-used tools, suggesting that lack of familiarity hinders broader experimentation.
Participants noted that thorough exploration demands a supportive environment for risk-free trials and open dialogue on potential pitfalls.

The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The Torchtune Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The LAION Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The Mozilla AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The HuggingFace Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The Gorilla LLM (Berkeley Function Calling) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

PART 2: Detailed by-Channel summaries and links

Stackblitz (Bolt.new) ▷ #announcements (1 messages):

Lucide Icons, Bolt update, Error resolution 

Lucide Icons Error Resolved: Bolt's agent can now access entire icon libraries, effectively eliminating the Lucide icon not found error.
No extra tokens or debugging required! This improvement is live on all projects as of the latest update.

Bolt Update: Deterministic Icons: The latest update from Stackblitz introduces deterministic icons, ensuring LLMs choose the correct icon without hallucination.
Now, icons can be picked accurately every time throughout all projects, streamlining the user experience.

Link mentioned: Tweet from StackBlitz (@stackblitz): Bolt 🧠 update: Deterministic IconsLLMs tend to hallucinate icon names which causes errors, but Bolt's agent can now access entire icon libraries and pick the perfect one — every time. (No extra t...

Stackblitz (Bolt.new) ▷ #prompting (6 messages):

NPM usage in React, Prompting for specific additions, Instructions for PDF annotation web app, File structure documentation, TraycerAI extension review 

Using NPM with React improves functionality: Members discussed instructing AI to employ specific NPM packages when developing in React, leading to improved functionality.
One noted that emphasizing no subtractions during development keeps the flow intact.

Prompting clearly for AI additions: A user mentioned their success in communicating with AI by stating when they wanted additions without any subtractions.
They emphasized that reminding AI not to 'improve' existing elements helps maintain focus.

Building detailed instructions for PDF annotation: A member suggested creating an instructions folder with detailed markdown files outlining the app flow, backend structure, and other essential project requirements for a PDF annotation web app.
However, when asked to add details, the AI began fabricating information not present in the actual web app.

File structure documentation for clarity: To bolster the project instructions, it was proposed to create a file structure document that lists all website files and their purposes.
This effort is aimed at improving overall comprehension and clarity of the project components.

Positive feedback for TraycerAI extension: Members discussed the promising capabilities of the TraycerAI extension, which works well in Cursor AI and tracks the entire codebase.
Using it allows for creating specific tasks and generates implementation plans, significantly enhancing workflow.

Link mentioned: Tweet from Sanket Dongre (@sanketdongre369): Just tried out @TraycerAI extension and it works in @cursor_ai!It keep track of the entire codebase. You can create specific tasks to implement, improve or fix features. It generates a plan which you ...

Stackblitz (Bolt.new) ▷ #discussions (283 messages🔥🔥):

Challenges with Bolt's functionality, User experiences with Supabase, Git integration for Bolt, Domain checking tool development, Efficient collaboration strategies 

Frustrations Over Bolt's Bugs and Errors: Users expressed significant frustration with Bolt's functionality, noting issues like erroneous deletions and placeholders in the code, leading to excessive token usage.
One user reported needing a developer to troubleshoot while others shared tips on utilizing Git and the importance of effective prompt construction to mitigate issues.

Experiences Integrating Supabase with Applications: Several users discussed their challenges connecting applications to Supabase, including invalid UUID errors that prevent successful requests.
A user suggested that validating and logging UUID inputs could help troubleshoot issues with the Supabase integration.

Upcoming Git Integration for Bolt: A user announced plans to release a Git integration tool for Bolt, aimed at simplifying project management and import processes.
The integration would enable users to clone repositories and manage projects more effectively, thereby minimizing errors during imports.

Cool Domain Crawler Tool in Development: A user is creating a domain crawler tool to check domains expiring daily, with plans to filter results based on specific criteria.
The discussion highlighted the potential for monetizing a tool that identifies valuable domain registrations for interested users.

Strategies for Collaborating with Developers: Users shared various strategies for inviting developers to assist with code, including sharing zip files or using Git.
The community emphasized the importance of providing clear structures and instructions when working with external developers to avoid confusion.

Links mentioned:

getdiscount.io: no description found
Screenshot: Captured with Lightshot

Eleuther ▷ #general (227 messages🔥🔥):

RWKV Model Performance, Box Puzzle AI Challenge, Model Size vs. Inference Speed, QRWKV Project, CoT Tuning 

RWKV Model Shows Promise: The RWKV models, particularly the 0.4B version, have performed well in various tests but struggle with complex tasks like the box puzzle, indicating room for improvement.
Comparisons to other models like Qwen highlight the RWKV model's unique capabilities, with discussions around the potential need for additional training techniques like CoT tuning.

Box Puzzle as a Benchmark: A complex box puzzle was introduced as a benchmark to test LLM capabilities, revealing that many models fail to provide efficient solutions.
Responses often result in unnecessary loops or erratic behaviors, emphasizing the challenge of implementing common-sense reasoning in AI.

Model Size and Inference Compatibility: The discussion explored the balance between model size (e.g., 72B vs. 32B) and inference speed, noting that larger models often lead to slower operational costs.
For many users, model efficiency is paramount, as computational demands for larger models can be prohibitively high.

QRWKV Project Development: The QRWKV project aims to convert transformer models into QRWKV format, benefiting from both linear time prefill and comprehensive historical context.
Insights were shared on future iterations, including efforts on QRWKV7 and addressing performance metrics across various benchmarks.

CoT Tuning Considerations: The importance of Chain of Thought (CoT) tuning for improving puzzle-solving capabilities in models was highlighted as a potential avenue for enhancing performance.
Discussions revolved around adjusting model training methodologies to include reasoning-focused approaches for better outcomes in AI tasks.

Links mentioned:

Neural Networks: Zero to Hero: no description found
RWKV-Gradio-1 - a Hugging Face Space by BlinkDL: no description found
Q-RWKV-6 32B Instruct Preview: The strongest, and largest RWKV model variant to date: QRWKV6 32B Instruct Preview
GitHub - SmerkyG/RWKV_Explained: RWKV, in easy to read code: RWKV, in easy to read code. Contribute to SmerkyG/RWKV_Explained development by creating an account on GitHub.
mollysama/QRWKV6-32B-Instruct-Preview-GGUF at main: no description found

Eleuther ▷ #research (39 messages🔥):

BERT and GPT Attention Mechanisms, Gradient Sparsity and Compression, NanoGPT Speedrun Achievements, SIRENs and Derivatives, Context Length Warmup Strategies 

BERT's CLS Token Directionality: The CLS token in BERT can attend to all other tokens, making its left placement a mere convenience without affecting performance as information flows bidirectionally.
In contrast, GPT tokens only attend to previous tokens, potentially limiting their contextual understanding, raising questions about performance implications.

Gradient Compression & Efficiency: Discussion highlighted the Deep Gradient Compression (DGC) method which reduces bandwidth in distributed SGD by eliminating redundant gradient exchange, potentially enhancing scalability.
Some believe gradient sparsity remains underutilized in practice, primarily due to its focus on federated learning, which is not widespread.

3-Minute NanoGPT Speedrun Record: A new record was achieved in the NanoGPT speedrun, reaching training completion in under 3 minutes on an 8xH100 setup costing only $0.40 per run.
This showcases drastic improvements over previous efforts, demonstrating the progress made in training efficiency with modded-nanogpt repo iterations.

Exploring SIRENs for Derivative Learning: A member noted that SIRENs facilitate efficient calculation of higher order derivatives, useful for tasks requiring implicit mappings from coordinates to outputs.
Concerns were raised around needing outputs to be curl-free for consistent scalar potential derivations.

Context Length Warmup Implementation: One participant described a method that warms up the sliding window attention size during training to enhance effective context length while maintaining data order.
They aim to achieve a maximum context of ~1856 through this strategy, underlining its effectiveness in the training process.

Links mentioned:

Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training: Large-scale distributed training requires significant communication bandwidth for gradient exchange that limits the scalability of multi-node training, and requires expensive high-bandwidth network in...
Scaling Laws for Pre-training Agents and World Models: The performance of embodied agents has been shown to improve by increasing model parameters, dataset size, and compute. This has been demonstrated in domains from robotics to video games, when generat...
DeMo: Decoupled Momentum Optimization: Training large neural networks typically requires sharing gradients between accelerators through specialized high-speed interconnects. Drawing from the signal processing principles of frequency decomp...
Tweet from leloy! (@leloykun): Sub 3-minute NanoGPT Speedrun RecordWe're proud to share that we've just breached the 3 min mark!This means that with an ephemeral pod of 8xH100s that costs $8/hour, training a GPT-2-ish level...

Eleuther ▷ #scaling-laws (19 messages🔥):

Low Resolution Data Effects, Model Approximation Phenomena, Precision vs Accuracy in Model Training, Ground Truth Data Challenges, Finite Element Method (FEM) Convergence 

Low Resolution Data may assist in learning: @hawk1399 noted that using low resolution data could lead a model to better approximate the ground truth, though the data generation method in the paper is unclear.
This was supported by @paganpegasus, who mentioned that models are likely to approximate a low-pass version of the training data unless overfitting occurs.

Precision versus Accuracy Explained: Members discussed the distinction between precision and accuracy in model training, implying a model may yield a lower error than the training data yet not reflect the true ground truth.
@uwu1468548483828484 pointed out that while FEM offers convergence to the true solution, knowing the PDE involved helps determine exact errors.

Concerns with Ground Truth Data: @hawk1399 expressed uncertainty about the existence of a ground truth data, suggesting that the model's comprehension may falter due to merely approximating simulated data rather than the actual truth.
@paganpegasus concurred, emphasizing that if the training data is not real but simulated, the model may struggle to grasp the concept.

Deconvolution debate: The discussion highlighted a disagreement on deconvolution, with @uwu1468548483828484 disputing the low-pass approximation claim, stating it may lead to fake detail.
Despite the disagreement, @paganpegasus sought clarity, asking for a rephrased understanding of these concepts in relation to model training.

Eleuther ▷ #lm-thunderdome (3 messages):

HF Fair Use Claim, Git Repository Availability 

Doubts on HF's Fair Use Claims: There was a skepticism expressed regarding HF's ability to claim fair use, highlighting that they are primarily the distributors.
The sentiment suggests that mere distribution does not warrant fair use protection.

Git Repository and Tar File Status: It was noted that the git repository is still accessible, along with a link to the tar file.
This indicates that relevant resources remain available for those interested.

Cursor IDE ▷ #general (236 messages🔥🔥):

Cursor IDE Performance, Claude Integration, Funding Confirmation, User Experience Issues, O1 Model Feedback 

Cursor IDE suffers from slow requests: Users reported experiencing delays with Cursor IDE, particularly when using Claude for completion tasks, with some waiting over 10 minutes for responses.
Several members suggested that the integration of Claude and the slow request functionality is making the IDE less usable, prompting discussions on alternative setups.

Cursor confirms $105M funding: Cursor announced that it raised $105 million from investors such as Thrive, Andreessen Horowitz, and Benchmark, further validating its impact in the developer tools market.
Community members expressed optimism that this funding will lead to significant updates and improvements in features, especially with the ongoing development of the underlying models.

User interface challenges in Cursor: Users have experienced issues with overlapping AI suggestions obscuring text and the inability to paste text into the chat, with some finding workarounds like Control + Shift + V.
Concerns were raised about navigating between chat and the composer pane, leading to suggestions for a more seamless transition.

Feedback on O1 model usage: Users discussed the effectiveness of the O1 model, sharing that it significantly improved their workflow in coding tasks, particularly for complex problems.
There were inquiries about using O1 through personal API keys versus paying extra via Cursor, indicating a desire for more clarity on integration options.

Suggestions for Cursor features: Community members proposed enhancements for Cursor, such as better documentation access and more intuitive switching between chat and composer modes.
Suggestions included the possibility of making the IDE alert users when tasks are complete, to improve overall user experience.

Links mentioned:

no title found: no description found
Tweet from OpenAI Developers (@OpenAIDevs): We’ve put together a reference implementation for building and orchestrating agentic patterns using the Realtime API. You can use this repo to prototype a voice app using multi-agent flows in less tha...
Tweet from Alex Albert (@alexalbert__): Quality-of-life upgrade for @AnthropicAI devs:We've adjusted prompt caching so that you now only need to specify cache write points in your prompts - we'll automatically check for cache hits a...
Ethan Mollick (@emollick.bsky.social): New randomized, controlled trial by the World Bank of students using GPT-4 as a tutor in Nigeria. Six weeks of after-school AI tutoring = 2 years of typical learning gains, outperforming 80% of other ...
Tweet from Cursor (@cursor_ai): We've raised $105m in Series B funding from Thrive, Andreessen Horowitz, Benchmark, and existing investors.  We're delighted to report that Cursor is now used by millions of engineers as their...
What is the difference between "agent" and "normal" modes?: So far the only one I can notice is ability to run terminal commands.
Anthropic Status: no description found

Unsloth AI (Daniel Han) ▷ #general (136 messages🔥🔥):

Issues with Model Saving, Hugging Face API Links, Colab GPU Instances, Qwen2 and Multi-gpu Support, Updates on Kaggle and Unsloth 

Challenges in Saving Models: Users reported issues in saving models due to 'out of memory' errors, even for normal training models that weren't part of CPT.
There was a discussion on troubleshooting and commands to manage resource allocation effectively during model saving.

Hugging Face API Miscommunication: A user faced issues with the Hugging Face API pointing to incorrect model links, causing confusion when trying to load specific configurations.
Despite using the correct model link provided by the community, the API still attempted to access the wrong repository version.

Colab as a Low-Cost GPU Solution: Members discussed the lowest GPU instance prices, with some mentioning experiences with Vast.ai and Colab's free instances.
While some users expressed dislike for Colab, others appreciated its zero cost for basic usage.

Multi-GPU Support Timeline: In response to inquiries about multi-GPU setups, there seems to be a tentative timeline set for early this year for implementation.
Discussions hinted at the potential for better support for multi-GPU configurations in upcoming updates.

Updates to Unsloth on Kaggle: Users highlighted new updates allowing installation of Unsloth via pip on Kaggle, claiming faster setup times for model training.
The community encouraged trying out the shared resources like the Kaggle notebooks for efficient learning and execution.

Links mentioned:

Llama 3.2 Vision - a unsloth Collection: no description found
burgasdotpro/bgGPT-Phi-4 · Hugging Face: no description found
Tweet from Unsloth AI (@UnslothAI): You can finetune Phi-4 for free on @Kaggle now!You'll learn how to:• Prepare your dataset• Train Phi-4 via Kaggle's free GPUs• Run, evaluate & save your modelUnsloth finetunes LLMs 2x faster w...
unsloth/Llama-3.2-90B-Vision-Instruct-bnb-4bit · Hugging Face: no description found
Minor fixes for granite models by CoffeeVampir3 · Pull Request #1503 · unslothai/unsloth: Minor fix for 4.47.1 transformers in llama class and granite model's config appears slightly different than in prior versions, the residual mult is now directly on layer
llama : improve BPE pre-processing + LLaMA 3 and Deepseek support by ggerganov · Pull Request #6920 · ggerganov/llama.cpp: Continuing the work in #6252 by @dragnil1This PR adds support for BPE pre-tokenization to llama.cppSummaryThe state so far has been that for all BPE-based models, llama.cpp applied a default pre...

Unsloth AI (Daniel Han) ▷ #off-topic (1 messages):

Prompt tracking tools for LLMs, Open-source LLM comparison 

Inquiry on tools for tracking prompts across LLMs: A member inquired about available packages or tools to track and compare a prompt across multiple open-source LLMs.
The request indicates a growing interest in maintaining consistent performance metrics across different models in the community.

Community seeks solutions for LLM comparison: The discussion highlighted a need within the community for resources that facilitate easy comparison of prompts across various open-source LLMs.
Members are looking for effective ways to evaluate model performance on similar tasks, emphasizing the importance of prompt consistency.

Unsloth AI (Daniel Han) ▷ #help (42 messages🔥):

Docker image for Unsloth AI, Inference speed comparison between Unsloth and Hugging Face, Finetuning support for Molmo, Error messages during Docker image setup, LoRa adapter training differences 

Unsloth Docker Image Discussion: A user found a Docker image for Unsloth AI, but it was confirmed not to be official.
A member advised using a generic PyTorch image with CUDA support instead, highlighting compatibility issues with the found image.

Inference Speed: Unsloth vs Hugging Face: A user reported that the inference speed for the Pixtral-12b LoRa adapter was the same for both Unsloth and Hugging Face, raising a question about expected performance.
Another member mentioned that Unsloth could be 2x faster on average over the long run, referring to additional resources for more information.

Support for Finetuning Molmo: A user inquired if Unsloth supports finetuning for Molmo, with an initial response indicating that it does not at the moment.
However, a subsequent message suggested the possibility of support being available soon, asking another user to test it.

Error Messages During Docker Setup: A user encountered pip dependency conflict errors when trying to set up a Docker image for Unsloth with PyTorch 2.4.1.
The resolution suggested was to use an older base image compatible with Unsloth to avoid such conflicts.

Training Differences Between LoRa Adapters: A user compared training times for LoRa adapters trained with Unsloth and Hugging Face, noting Unsloth's training was 2x faster.
The discussion indicated that while training speed improved with Unsloth, inference speeds appeared comparable, prompting further inquiries into performance.

Links mentioned:

 Finetune Gemma with Unsloth: Fine-tune Google's new Gemma model 2.4x faster with 58% less memory VRAM via Unsloth!
no title found: no description found
no title found: no description found
PyTorch Release 24.01 - NVIDIA Docs: no description found

Unsloth AI (Daniel Han) ▷ #showcase (5 messages):

Thinking Models, Llama Model Fine-tuning, Benchmarking AI Models 

Qwen 2.5: A New Thinking Model: The Qwen 2.5 model emphasizes a two-step processing approach: a thinking stage followed by answer generation, aiming to enhance output quality with better context.
However, it still faces issues such as generating unintended answers and sometimes producing overly long outputs.

Llama Thinker Model Details: Codelion's Llama-3.2 model was finetuned using Unsloth's framework, achieving faster training speeds and aiming for improved performance.
The model features 3.21B parameters and is currently available for public use, garnering 139 downloads in the past month.

Benchmarking Small Llama Models: Codelion noted that the 3B model is too small to effectively induce 'thinking' in LLMs, leading to plans for fine-tuning the Llama-3.3-70B model.
This reflects a broader trend in AI model development focusing on increasing model sizes for better performance in complex tasks.

Links mentioned:

Ba2han/Qwen-2.5-7B-Woonderer-0.1 · Hugging Face: no description found
codelion/Llama-3.2-3B-o1 · Hugging Face: no description found
codeli (haohao): no description found
codelion/Llama-3.2-3B-o1-lora · Hugging Face: no description found
codelion/Sky-T1_data_17k · Datasets at Hugging Face: no description found

Unsloth AI (Daniel Han) ▷ #research (1 messages):

KD Full Fine-tuning, Selective Weights, LORA 

Exploring KD Fine-tuning Methods: KD is a full fine-tuning method of the student model, raising the question of its applicability in conjunction with selective weights.
A member pointed out the potential parallels to LORA, suggesting possible overlap in methodology.

Debating the Use of Selective Weights: Discussion centered on whether KD can function effectively with selective weights, similar to approaches seen in LORA.
The implications of such methods on model performance were noted as a key point of interest.

MCP (Glama) ▷ #general (100 messages🔥🔥):

MCP terminology confusion, MCP tools and clients, Sage and its marketplace feature, Timeout limitations in MCP, Integration and testing of MCP SDK 

MCP Terminology Confusion: Several members expressed confusion regarding MCP terminology, with some stating they've read the MCP Bridge README multiple times without full understanding.
This led to discussions about documentation clarity and resource sharing within the community.

Sage's MCP Marketplace Feature: Sage recently won the MCP Run hackathon, showcasing its new marketplace feature that allows for one-click installation of MCP servlets.
Users are excited about its accessibility on devices like iPad, iPhone, and Mac.

Timeout Limitations in MCP: Members discussed the 60-second timeout issue with MCP server responses, with one stating it's a limitation of Claude Desktop rather than the protocol itself.
Alternatives like using a session identifier and notifications to manage this limitation were explored.

Integration and Testing of MCP SDK: A user inquired about unit tests for the Python SDK that utilize an actual server, leading to discussions on the effectiveness of subprocess testing.
While integration tests could be flaky due to dependencies, members considered various testing approaches to ensure robust functionality.

MCP Tools and Clients: Several discussions revolved around various MCP clients, such as Sage and Claude, with some members lamenting missing features or bugs.
There were also mentions of the need for a comprehensive list of MCP-supporting chat apps and the challenges of managing multiple tools effectively.

Links mentioned:

json-schema-to-zod: Converts JSON schema objects or files into Zod schemas. Latest version: 2.6.0, last published: 20 days ago. Start using json-schema-to-zod in your project by running `npm i json-schema-to-zod`. There ...
39 MCP Clients: AI-powered apps compatible with MCP servers | PulseMCP: A collection of AI apps and tools that are capable of functioning as Model Context Protocol (MCP) clients to interact with the growing list of MCP servers.
Working with Tasks | 🤖: Tasks let you register prompts with a suite of installed servlets and trigger
MCP-Bridge/docs/usecases.md at master · SecretiveShell/MCP-Bridge: A middleware to provide an openAI compatible endpoint that can call MCP tools - SecretiveShell/MCP-Bridge
GitHub - appcypher/awesome-mcp-servers: Awesome MCP Servers - A curated list of Model Context Protocol servers: Awesome MCP Servers - A curated list of Model Context Protocol servers - appcypher/awesome-mcp-servers
python-sdk/tests/server/test_session.py at main · modelcontextprotocol/python-sdk: The official Python SDK for Model Context Protocol servers and clients - modelcontextprotocol/python-sdk
Allow setting the timeout with the "timeout" URL parameter by evalstate · Pull Request #100 · modelcontextprotocol/inspector: Allows setting the Request Timeout in milliseconds with a "timeout" URL parameterMotivation and ContextEnables testing of tools which take > 10s to respond.How Has This Been T...

MCP (Glama) ▷ #showcase (25 messages🔥):

MCP-Bridge Usage, Discord Bot Development, David Shapiro's ACE Framework, Drinks Blog on User Simulation, GitHub Project by frgmt0 

MCP-Bridge Users Seek Guidance: A user attempted to use MCP-Bridge with AnythingLLM but faced challenges and requested examples for effective usage.
Another suggested joining the MCP-Bridge Discord server for further assistance.

Tensions Over Discord Bots: Members engaged in a debate about the effectiveness of Discord bots, with some suggesting that modern Discord commands cover most functionalities.
'Modern discord has built in commands for most of the functionality you would want anyway,' noted one member.

Skepticism Towards David Shapiro: Discussion centered on David Shapiro's ACE framework, with one member expressing disappointment over its lack of community projects.
Another added that Shapiro resembles a less qualified Ray Kurzweil, emphasizing the depth of Ray's writings.

User Simulation Tricks Revealed: A user shared a tip on simulating Discord interaction, highlighting the benefits of using a specific system prompt.
They remarked on the irony of user simulations, stating, 'my point proven' after disparaging typical interaction methods.

Frgmt0's Early Project Launch: A user revealed their early alpha project on GitHub, inviting feedback from others.
They provided a link to the project and encouraged users to report any issues.

Link mentioned: GitHub - frgmt0/blnk: Contribute to frgmt0/blnk development by creating an account on GitHub.

Interconnects (Nathan Lambert) ▷ #news (102 messages🔥🔥):

SWE-bench release, WeirdML benchmark, OpenAI vagueposting, Deepseek R1 release rumors, Concerns over AI transparency 

SWE-bench Multimodal Evaluation Code Released: The new SWE-bench MM introduces JavaScript issues focusing on visual components like map rendering and button visibility.
This release is expected to enhance multimodal evaluations within the AI community.

WeirdML Achieves New Benchmark: Results from WeirdML benchmark indicate significant progress in evaluating LLMs on unusual machine learning challenges, utilizing PyTorch.
The benchmark emphasizes LLMs' iterative learning capabilities through feedback, sparking a conversation about the evolving role of these models.

OpenAI's Vagueposting Strategy Under Scrutiny: Members expressed frustration over OpenAI's current vagueposting trend, suggesting it dilutes transparency and fuels speculation about their projects.
They highlighted the importance of clear communication in AI development, especially regarding safety and model capabilities.

Deepseek R1 Release Anticipated: Rumors are circulating about an upcoming Deepseek R1 release, with early results showing performance on par with o1-Medium.
This release is expected to coincide with major technological advancements and may shift the dynamics within the AI community.

Questions Over AI Research Transparency: Concerns were raised about the current state of AI research transparency, with discussions around the omission of data curation details in tech reports.
Members lamented the shift from open discussions and knowledge sharing to a more guarded approach in the AI space.

Links mentioned:

Tweet from Naman Jain (@StringChaos): DeepSeek-R1 (Preview) Results 🔥We worked with the @deepseek_ai team to evaluate R1 Preview models on LiveCodeBench. The model performs in the vicinity of o1-Medium providing SOTA reasoning performanc...
Tweet from Igor Kotenkov (@stalkermustang): btw o3-mini access is out for (at least some) testersthought there are two models, and it seems OpenAI wasn't losing any time during NY holidays
Tweet from John Yang (@jyangballin): SWE-bench Multimodal evaluation code is out now!SWE-bench MM is a new set of JavaScript issues that have a visual component (‘map isn’t rendering correctly’, ‘button text isn’t appearing’).
Tweet from Sam Altman (@sama): thank you to the external safety researchers who tested o3-mini.we have now finalized a version and are beginning the release process; planning to ship in ~a couple of weeks.also, we heard the feedbac...
Tweet from Håvard Ihle (@htihle): Exited to share the results from WeirdML - a benchmark testing LLMs ability to solve weird and unusual machine learning tasks by writing working PyTorch code and iteratively learn from feedback.
Tweet from Ted Sanders (@sandersted): @jeremyphoward lol who said that OpenAI has built ASI?i suspect you may have misinterpreted(it is true that some people think AGI is likely in the next few years, and while I disagree with them, I thi...
Tweet from Noam Brown (@polynoamial): Lots of vague AI hype on social media these days. There are good reasons to be optimistic about further progress, but plenty of unsolved research problems remain.
Sell Domains | Buy Domains | Park Domains: no description found

Interconnects (Nathan Lambert) ▷ #ml-drama (4 messages):

NeurIPS PC criticism, Personalities in ML conferences, Career transitions in ML 

NeurIPS PCs face serious criticism: A member expressed disappointment in NeurIPS organizing processes, calling it a 'clown show', highlighting the lack of academic integrity in the community.
They lamented that the status quo allows for hype-driven publications with minimal accountability, stating 'the ML community has neither teeth nor an appetite for academic integrity.'

Conferences need a wake-up call: Another member agreed, emphasizing that conferences require a wake-up call regarding their handling of serious issues.
'Just stick with the message,' they advised, suggesting that the community deserves more serious engagement.

Career transition at DbrxMosaicAI: A member shared their gratitude after completing their last week at DbrxMosaicAI, reflecting on three years of experience in the startup ecosystem.
They noted their contributions to a successful exit and the release of three open-source models while leading a talented team of LLM data researchers.

Links mentioned:

Tweet from Andreas Kirsch 🇺🇦 (@BlackHC): I'm sorry for any hurt feelings for calling NeurIPS PCs clowns and pointing out an apparent domain conflict of interest. I didn't mean any PC individually or personally, but the organization a...
Tweet from Cody Blakeney (@code_star): Last week was my final week at @DbrxMosaicAI. I am so grateful to have been part of such an amazing team and journey. In my three years, I learned so much about the startup ecosystem, was part of a su...

Interconnects (Nathan Lambert) ▷ #random (6 messages):

Olmo 2 Pod Experience, Internet Businesses, Resource Allocation, H100 vs CS-2 Comparison 

Mixed Reviews on Olmo 2 Pod: The Olmo 2 pod experience in the office was noted as meh, but overall, the pod provided a fun and wholesome time.
A new audio tool was discovered during this time, adding an exciting dimension to the experience.

Fun in Internet Ventures: There's a buzz around the many fun internet businesses currently trending, sparking interest and curiosity.
This reflects a growing enthusiasm for innovative online startups and entrepreneurial endeavors.

Resource Allocation Secrets: A hint was dropped about leaving an Easter egg regarding resource allocation at the end of discussions, suggesting a potential insight.
This indicates a nuanced approach to sharing important information within the team.

H100s vs CS-2 for LLM Pretraining: There was a query regarding how many H100s equate to a single CS-2 from Cerebras for LLM pretraining, with a specific emphasis on resource quantity.
One member indicated that 1000 H100s could be sufficient, but acknowledged uncertainty about the specifics of the comparison.

Link mentioned: 
Auphonic
: no description found

Interconnects (Nathan Lambert) ▷ #nlp (3 messages):

Survey Papers on Reasoning Models, Criticism of Survey Papers 

Survey on Reinforced Reasoning Models: A paper titled 'Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models' was shared for review, raising questions about its depth and quality.
A member commented that survey papers often tend to contain fluff, suggesting skepticism about their overall value.

General Gloom about Surveys: A member expressed that survey papers can sometimes be underwhelming, conveying a general discontent with their findings.
This sentiment highlights a common concern within the community regarding the usefulness of survey research.

Interconnects (Nathan Lambert) ▷ #reads (4 messages):

Debt: The First 5000 Years, Devin AI, Answer AI's Impact 

Debt: The First 5000 Years Gains Recognition: A user touted Debt: The First 5000 Years as a great book and praised its author, highlighting its value.
The book is also available in audiobook format, allowing for broader accessibility.

Devin AI Makes a Splash with Series A Funding: In March 2024, a new AI company, Devin, emerged with a $21 million Series A led by Founders Fund, backed by notable industry figures like the Collison brothers and Elad Gil.
Devin is designed to be a fully autonomous software engineer, capable of chatting like a human colleague, learning new technologies, and even completing Upwork tasks independently.

Early Demos Show Devin's Promising Capabilities: Early demos of Devin displayed its ability to complete a PyTorch project autonomously, showcasing impressive technical prowess with a 13.86% resolution rate on GitHub issues.
A video demonstrated Devin's capability to fulfill an Upwork bounty without human intervention.

Appreciation for Answer AI's Contributions: A member expressed gratitude for Answer AI, appreciating its unique contributions to the AI landscape.
Answer AI is recognized for its independent operations and innovative approaches in the AI domain.

Links mentioned:

Thoughts On A Month With Devin – Answer.AI: Our impressions of Devin after giving it 20+ tasks.
Adobe Acrobat: no description found

Interconnects (Nathan Lambert) ▷ #lectures-and-projects (1 messages):

H2O.ai products, H2O LLM Datastudio, Data preparation for LLMs 

Seeking Insights on H2O.ai Experience: A member is looking for experiences with H2O.ai products, specifically the H2O LLM Datastudio, to aid in data preparation tasks.
Thanks to the community for any shared experiences or insights about the tool.

H2O LLM Datastudio Overview: The H2O LLM Datastudio is a no-code application designed to streamline data curation, preparation, and augmentation tasks related to Large Language Models (LLMs).
Detailed information is available in the documentation.

Link mentioned: H2O LLM DataStudio | Docs | H2O LLM DataStudio | Docs: 

Codeium (Windsurf) ▷ #announcements (1 messages):

Windsurf Wave 2 Release, Cascade Web Search, Cascade Autogenerated Memories, Performance Improvements, System Status 

Windsurf Wave 2 Surfs In: The highly anticipated Windsurf Wave 2 has launched with major new features, including web search capabilities and autogenerated memories for Cascade.
Check out the full details of the announcement on the Codeium blog.

Cascade Can Now Search the Web: Cascade introduces its ability to search the web either automatically, via URL input, or by using @web and @docs commands.
This feature allows users to paste URLs for specific context or trigger searches based on their queries, making it highly user-friendly.

Cascade Autogenerated Memories Make Conversations Smarter: With the new update, Cascade now automatically generates memories to maintain context across conversations.
This enhancement aims to enrich the user experience by ensuring continuity, making interactions smoother and more relevant.

Performance Boosts and Fixes Galore: Significant performance improvements were made alongside the fix of several Dev Container issues, enhancing overall efficiency.
Users are likely to experience a more polished performance thanks to these health checks and technical enhancements.

Codeium System Status Updates: Users can now check the status of Windsurf/Codeium at https://status.codeium.com, with everything reported as operational.
They maintain transparency with diagnosed issues and resolutions tracked in recent past incidents on their status page.

Links mentioned:

Codeium Status: no description found
Windsurf Wave 2: Introducing Wave 2, our second batch of updates to the Windsurf Editor.
Tweet from Windsurf (@windsurf_ai): Wave 2 is here. Included in this update: 🌐Web Search🧠Autogenerated Memories💼Enterprise Ready... and many more!
Windsurf Editor Changelogs | Windsurf Editor and Codeium extensions: Latest updates and changes for the Windsurf Editor.

Codeium (Windsurf) ▷ #discussion (28 messages🔥):

Codeium customer service issues, Student discount program expansion, IDE features and bug reports, Login issues with Codeium extension, Bugs fixes and new features feedback 

Frustration with Codeium Customer Service: A member expressed significant frustration regarding their attempt to obtain a $297 refund, stating that instead of help, customer service asked for more details on their request.
*

Expansion of Student Discount Program: Discussion arose about the student discount announcement, with one member questioning if it applies only to local students.
Another member confirmed that they are working on expanding the program to include other universities outside the US.

IDE Bugs and Feedback: Several users reported issues like autocomplete failures and problems with repositories when trying to update the IDE, along with a request for help.
One user praised the fixes and new features, noting that each update is a big wave of improvement.

Login Problems with Codeium Extension: A user faced challenges logging into the Codeium extension on Linux, reporting that both auto and manual logins failed, unlike on Mac.
They were advised to submit logs and a support ticket for assistance.

App Development Resources Inquiry: A user inquired about the availability of templates or boilerplate projects for app development, mentioning starting points like Lovable or Bolt.
They sought advice or links from others who have dealt with similar development tasks.

Link mentioned: Open VSX Registry: no description found

Codeium (Windsurf) ▷ #windsurf (90 messages🔥🔥):

Windsurf account issues, Integration of tools in Windsurf, Pricing discrepancies for students, User experience with Windsurf features, Bug reports and support queries 

Students facing pricing issues: Users with .edu emails reported being charged the $10 'early adopter' price instead of the expected student rate of $6.90, particularly if their emails don't end strictly in '.edu'.
Concerns were raised about older .edu accounts not qualifying for discounts, indicating potential system flags on certain email addresses.

Suggestions for tool integration: A member proposed packaging tools like crawl AI to enhance Windsurf's capabilities, allowing users to perform web crawls via a command integrated into the system prompt.
Another user expressed interest in implementing features like allowing user-provided API keys for external tools, suggesting that such capabilities would encourage broader usage of Windsurf.

Frustrations with Windsurf features: Several messages indicated that current features of Windsurf, such as automatic code edits or command executions, were not functioning as expected, often resulting in errors.
Users noted problems like infinite loops, API issues, and unclear errors related to actions within the tool, leading to confusion and frustration.

Difficulty in upgrading plans: Users attempting to upgrade their Windsurf plans reported receiving success messages but faced issues where their account status remained unchanged.
A user posted a screenshot indicating that despite the upgrade message, no actual adjustments were visible, prompting suggestions to contact support.

Resource sharing and community help: Members discussed the lack of documentation for certain Windsurf features, particularly regarding extension development and usage of specific coding languages with the platform.
Community members shared resources and guides, emphasizing that better user support and more detailed tutorials would enhance the overall experience.

Links mentioned:

Windsurf - Cascade: no description found
Tweet from Windsurf (@windsurf_ai): How to use Command
Windsurf Wave 2: Introducing Wave 2, our second batch of updates to the Windsurf Editor.
1. Upload Image, 2. Web crawling for information, 3. Indexing docs | Feature Requests | Codeium: Upload images or screenshots to chat so that i can use for references when making UI changes. @web so that the agent can crawl web and get relevent sources and
Cursor Directory: Find the best cursor rules for your framework and language

OpenRouter (Alex Atallah) ▷ #general (116 messages🔥🔥):

Activity Page Confusion, Gemini Model Endpoint Changes, OpenRouter API and Regional Restrictions, DeepSeek Performance, BYOK API Key Integration 

Confusion Over Activity Page: Users expressed frustration regarding the activity page, where the graph displays the same information across multiple keys, leading to confusion about whether this is a bug or intended feature.
Several users noted the importance of tracking usage per key for better management.

Gemini Model's Endpoint Changes: A member highlighted that the Gemini 2.0 flash model has a new endpoint for requests, which has caused errors in the OpenRouter setup.
Other users confirmed the website's documentation had to be updated to reflect these changes.

Issues with Regional Restrictions: Several users reported that OpenRouter requests from Hong Kong face restrictions, while using Singaporean IPs resolves the issue, suggesting a possible new relay node.
Others recalled that OpenAI and Anthropic have long been limited in the region, which might explain the current challenges.

DeepSeek Performance and Configuration: Discussion emerged about user experiences with DeepSeek V3 performance, with inquiries regarding optimal settings for the highest quality outputs.
Some users noted varied results in results quality, sparking conversations about effectiveness across different use cases.

BYOK Integration Feedback: Users suggested that BYOK (Bring Your Own Key) feature could benefit from clearer messages confirming successful key integrations.
Feedback was shared on additional metadata for requests indicating if the BYOK was successfully activated.

Links mentioned:

OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
AI Engine: Adds AI features to WordPress. Chatbots, Forms, Copilot, Content Generation, and much more!
Integrations | OpenRouter: Bring your own provider keys with OpenRouter
DeepSeek V3 - API, Providers, Stats: DeepSeek-V3 is the latest model from the DeepSeek team, building upon the instruction following and coding abilities of the previous versions. Pre-trained on nearly 15 trillion tokens, the reported ev...

aider (Paul Gauthier) ▷ #general (45 messages🔥):

DeepSeek 3 Performance and Configuration, Aider and GitHub Milestone, OpenRouter Provider Settings, Quantization Impact on DeepSeek, CodeGate Security Features 

DeepSeek 3 faces context issues: A member reported issues with OpenRouter only routing them to a DeepSeek3 model with 16k context capability, leading to constant context size errors.
Another member suggested ignoring that provider in settings, which was welcomed as a solution.

Aider celebrates GitHub success: The Aider community celebrated passing 25k stars on GitHub, marking a significant milestone for the project.
Commendations poured in for being recognized as a top AI coding assistant.

Discussion on DeepSeek performance comparison: One member expressed the need for a performance comparison of DeepSeek3's full version vs. quantized versions like Q4 or Q5 to gauge impact.
They conveyed satisfaction with full precision and skepticism about using heavily quantized versions.

CodeGate's privacy-focused functionalities: A developer shared insights on CodeGate, emphasizing its local operation to enhance security by preventing sensitive data exposure during AI coding.
They linked to two YouTube videos demonstrating how CodeGate can safeguard programming from security risks.

Concerns about Python application failure costs: A user noted that the loop of failure attempts in a Python app can escalate quickly and incur significant costs.
This highlights the operational challenges faced in maintaining efficient code during runtime.

Links mentioned:

OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
In-chat commands: Control aider with in-chat commands like /add, /model, etc.
Reddit - Dive into anything: no description found
unsloth/DeepSeek-V3-GGUF · Hugging Face: no description found
Avoid risky dependencies in AI-generated code with CodeGate: AI coding assistants are amazing productivity boosters, but they can introduce security vulnerabilities to your projects due to outdated knowledge. CodeGate ...
Stop AI coding assistants from leaking your secrets with CodeGate: Is your AI coding assistant spilling your secrets? Chances are, the answer is "yes". See how CodeGate protects your privacy and security by encrypting your s...
GitHub - stacklok/codegate: CodeGate: CodeGen Privacy and Security: CodeGate: CodeGen Privacy and Security . Contribute to stacklok/codegate development by creating an account on GitHub.

aider (Paul Gauthier) ▷ #questions-and-tips (60 messages🔥🔥):

Agentic Tools for Code Exploration, User Customization of Aider, Scraping Limitations with Aider, Context Limits Issue, Sparse Priming Representation 

Diverse Agentic Tool Options Discussed: Various agentic tools for exploring codebases were highlighted, including Aide.dev, Cursor, and custom code exploration tools developed using PydanticAI.
One user shared their experience with building a code-exploration CLI, which leveraged features for terminal, ctags, and file reading.

User Customization for Aider Prompts: A user shared their approach to automating 'coding feature prompt' creation for specific tasks, emphasizing the need for effective query management.
Suggestions were made to improve local RAG processes while minimizing costs and increasing efficiency for user prompts.

Scraping Limitations Encountered: Users encountered issues with Azure GPT-4o, where overly large scraped contents resulted in a BadRequestError due to exceeding context limits.
Recommendations included manually copying pertinent sections of the scraped content instead of submitting the entire page to avoid this error.

Context Limits Create Frustrations: A user expressed frustration regarding context limits being triggered despite feeling well within bounds when using Aider.
Discussions emerged about whether Aider sends complete scraped data to models, which was pointed out as problematic.

Interest in Sparse Priming Representation: The concept of Sparse Priming Representation was introduced, with references to resources for deeper understanding.
Users discussed its implications and how it could potentially enhance the functionality and efficiency of Aider.

Links mentioned:

Enhanced Swarm Orchestration with AG2 - AG2: no description found
Cursor not able to access full code base: you guys dont advertise it like that
configargparse.ArgumentParser: no description found
Tweet from Taelin (@VictorTaelin): IT WORKS! Demo time 🥳(next 10x productivity bump is here?)Suppose you must refactor a large codebase, e.g.:> "use I32 instead of U32 as the native number type"That task, by itself, is simp...
GitHub - TheFoundation-Global/TheRitualistsPrimer: A Sparse Priming Representation reference library: A Sparse Priming Representation reference library. Contribute to TheFoundation-Global/TheRitualistsPrimer development by creating an account on GitHub.

aider (Paul Gauthier) ▷ #links (2 messages):

Helicone LLM Observability, Activepieces Tool Overview, LLM Documentation Resources 

Helicone offers LLM observability: The Helicone GitHub repository presents an open source LLM observability platform designed to monitor, evaluate, and experiment with AI models efficiently.
Key features include tracking requests and costs, LLM security layers, caching, and customizable rate limits, with recommendations to run it using docker-compose or their cloud version.

Activepieces connects various LLMs: The Activepieces website provides useful integrations to work with multiple LLM services and offers documentation on their metrics data via llms.txt.
Metrics include varied usages like 3747 in the Activepieces service, demonstrating its potential as a resourceful integration tool.

LLM Documentation Resources: Numerous LLM resources are available, including AI Squared at squared.ai with specific usage metrics found in their documentation.
Other options include Anthropic and Aporia, with full documentation accessible at their respective links, showing usage metrics such as 90337 for Anthropic.

Links mentioned:

llms.txt directory: Find and explore llms.txt files from various products and services.
GitHub - Helicone/helicone: 🧊 Open source LLM observability platform. One line of code to monitor, evaluate, and experiment. YC W23 🍓: 🧊 Open source LLM observability platform. One line of code to monitor, evaluate, and experiment. YC W23 🍓 - Helicone/helicone

Nous Research AI ▷ #general (58 messages🔥🔥):

Nous funding, OpenAI compensation structures, Employee ownership and shares, Model hosting options, Training frameworks 

Nous Research secures $400M funding: A member noted that Nous Research has successfully acquired $400 million in funding, indicating significant interest in their future developments.
Many are hosting via HF or getting pro subs, suggesting a growing demand for their services.

Insights on OpenAI's compensation strategies: Discussion centered around OpenAI's complex compensation structure, where some employees earn profit participation units (PPUs) instead of traditional shares.
It's noted that while senior engineers can earn shares, regular researchers might only have non-share equity compensation.

Employee stock options amid expectations: Members discussed how employees at OpenAI and Anthropic likely earn shares, especially after a secondary round that allowed sales to occur for those who were employed as of 2022.
Concerns were raised about the actual value of these shares under the company's current structure, given that it may not represent genuine ownership.

Lively debate on model hosting capabilities: A member pointed out the options for hosting models via Hugging Face or utilizing professional subscriptions, indicating diverse use cases.
Another pointed out that consulting and GPU rental services are also popular, highlighting various methods of accessing AI resources.

Various training frameworks gain attention: Discussion revealed that the Nous team primarily uses Axolotl for fine-tuning, with alternatives like LlamaFactory and Unsloth mentioned.
Insights into broader frameworks used included Lingua, Olmo, and TorchLightning, signalling a diverse approach to model training.

Links mentioned:

OpenAI PPUs: How OpenAI's unique equity compensation works: A look at one of the hottest and most secretive AI companies today.
OpenAI Software Engineer Salary | $362K-$1.34M+ | Levels.fyi: Software Engineer compensation in United States at OpenAI ranges from $362K per year for L3 to $1.34M per year for L6. The median yearly compensation in United States package totals $238K. View the ba...
Hundreds of OpenAI's current and ex-employees are about to get a huge payday by cashing out up to $10 million each in a private stock sale: A group of current and former OpenAI employees are eligible to cash out up to $10 million worth of shares as part of the company’s $1.6 billion tender offer to SoftBank, a source has told Fortune.
Nous: Hermes 3 405B Instruct – Provider Status: See provider status and make a load-balanced request to Nous: Hermes 3 405B Instruct - Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabili...
OpenAI and Microsoft have put a price tag on what it means to achieve AGI: report: The companies reportedly signed an agreement last year that defined AGI as a system that can generate $100 billion in profits.

Nous Research AI ▷ #ask-about-llms (17 messages🔥):

RAG chatbot architecture, Small model alternatives, LLM modification suggestions, Structured output challenges 

RAG Chatbot Faces GPT-2 Limitations: A user shared their struggles with using GPT-2 for a RAG chatbot based on PDF data, noting issues with token size leading to nonsensical outputs and repetitions.
Participants suggested switching to newer small models as GPT-2 may not be ideal for the task.

Exploring Alternative Small Models: Another member recommended newer models like smollm and Qwen as viable alternatives that could outperform GPT-2.
The user expressed interest in exploring these suggested models for their project.

Challenges with LLM Document Modifications: A member asked for the best method for an LLM to provide suggestions on a long document while returning a JSON output based on a defined set of rules.
Another user shared poor experiences with structured outputs, citing an inability of their model to accurately assess modifications.

Improving Document Modification Outputs: Issues arose regarding the LLM's understanding, leading to numerous modification suggestions even when rules weren't broken.
Discussion ensued about the effectiveness of including reason fields and start/stop markers in their modification schemas.

Nous Research AI ▷ #research-papers (6 messages):

New Neural Architecture, Titans Implementation in PyTorch, Large Language Models Book 

Exciting New Neural Architecture Unveiled: A member shared a new neural architecture that aims to improve long-term memory in attention mechanisms, detailing its potential for enhanced dependency modeling.
The work discusses utilizing historical context while maintaining fast parallelizable training and inference.

Lucidrains' PyTorch Titans Implementation is Live: A member highlighted that @lucidrains has started implementing the Titans architecture in PyTorch, which is noted for its memory efficiency for transformers.
This implementation aims to provide accessible tools for leveraging the state-of-the-art memory strategies in transformer models.

Foundational Insights on Large Language Models: A member referred to a book on large language models that focuses on foundational concepts including pre-training, generative models, and alignment methods.
It's tailored for students and professionals in natural language processing, making it a useful reference for understanding key aspects of large language models.

Links mentioned:

Foundations of Large Language Models: This is a book about large language models. As indicated by the title, it primarily focuses on foundational concepts rather than comprehensive coverage of all cutting-edge technologies. The book is st...
Titans: Learning to Memorize at Test Time: Over more than a decade there has been an extensive research effort on how to effectively utilize recurrent models and attention. While recurrent models aim to compress the data into a fixed-size memo...
GitHub - lucidrains/titans-pytorch: Unofficial implementation of Titans, SOTA memory for transformers, in Pytorch: Unofficial implementation of Titans, SOTA memory for transformers, in Pytorch - lucidrains/titans-pytorch

Nous Research AI ▷ #research-papers (6 messages):

New neural long-term memory module, Titans implementation in PyTorch, Overview of large language models 

New architecture for neural long-term memory: A new neural long-term memory module has been introduced, which enhances dependency modeling by allowing attention to access historical context while attending to current inputs.
The abstract highlights that this approach retains fast parallel training and inference capabilities, balancing context length and dependency accuracy.

Titans memory implementation in PyTorch: Developer lucidrains has begun implementing the Titans architecture, which focuses on SOTA memory for transformers in PyTorch.
This GitHub repository presents an unofficial implementation, contributing to the broader AI community's exploration of advanced transformer capabilities.

Foundational concepts of large language models: A new book has been discussed that centers on foundational concepts of large language models with a focus on four key areas: pre-training, generative models, prompting techniques, and alignment methods.
It is aimed at college students and professionals, providing a useful reference for those interested in the field of natural language processing, as indicated in the abstract.

Links mentioned:

Foundations of Large Language Models: This is a book about large language models. As indicated by the title, it primarily focuses on foundational concepts rather than comprehensive coverage of all cutting-edge technologies. The book is st...
Titans: Learning to Memorize at Test Time: Over more than a decade there has been an extensive research effort on how to effectively utilize recurrent models and attention. While recurrent models aim to compress the data into a fixed-size memo...
GitHub - lucidrains/titans-pytorch: Unofficial implementation of Titans, SOTA memory for transformers, in Pytorch: Unofficial implementation of Titans, SOTA memory for transformers, in Pytorch - lucidrains/titans-pytorch

Notebook LM Discord ▷ #use-cases (11 messages🔥):

NotebookLM prompts, Virtual travel agent bot, NotebookLM for science, AI Studio reliability, NotebookLM for computer topics 

Sharing prompts boosts learning: A member emphasized the importance of sharing prompts, suggesting it enhances collective knowledge about using NotebookLM.
This is why we need to share prompts too lol was the humorous remark in support of this idea.

Virtual Travel Agent Workshop Success: Another member shared their recent workshop experience on creating a virtual travel agent bot for Zambian vacations, highlighting a detailed outline document from the session.
You can find the complete outline here.

NotebookLM struggles with scientific topics: Concerns were raised about NotebookLM's limitations in scientific applications, with one member labeling it as 'insanely dumb at making connections'.
They pointed out it tends to only notice proximity of words without understanding their interactions.

AI Studio as a Reliable Alternative: A user recommended AI Studio as a more reliable tool compared to NotebookLM for various tasks, highlighting its superior performance.
They expressed skepticism about NotebookLM's capabilities, particularly in making accurate connections.

NotebookLM excels in computer topics: One member reported a positive experience using NotebookLM for understanding computer-related topics like HTTP, stating it outperforms other areas.
They found that AI is quite effective in grasping and explaining brand new information, noting the tool's strengths.

Links mentioned:

no title found: no description found
Your FREE AI Tutor is Here! Learn 10x Faster with NotebookLM 🚀: Subscribe to stay up to date:  https://bit.ly/3Q98G7pDiscover how NotebookLM, a powerful AI tool from Google, can revolutionize your learning experience. In ...

Notebook LM Discord ▷ #general (76 messages🔥🔥):

Podcast Generation Limitations, Notebooks Usage Questions, YouTube Video Link Handling, Interactive Mode and Bugs, Audio Overview Customization 

Podcast Generation Limitations: Users discussed difficulties in generating podcasts effectively, particularly when handling multiple sources or uploads, with some citing the need for cleaner link formats.
One user noted they had been waiting over a month for the podcast feature to function, raising concerns about potential bugs or server issues.

Notebooks Usage Questions: There is concern regarding limitations on the number of notebooks users can create, with a limit of about 100 for free users mentioned by one member.
Additionally, users discussed the implications of sharing notebooks and permissions for controlling source additions or removals by collaborators.

YouTube Video Link Handling: A user experienced issues uploading a livestream video to NotebookLM due to the link format and found converting it to a direct YouTube link solved the problem.
The community confirmed that direct links are required for proper video recognition by NotebookLM, avoiding links that include 'live.'

Interactive Mode and Bugs: Several users reported frustrations with the interactive mode, with some unable to access its features or encountering continuous loading issues.
Suggestions for troubleshooting included managing cookies and exploring compatibility with different browsers as potential solutions.

Audio Overview Customization: Discussions arose around the potential for customizing audio summaries, with some members inquiring about the feasibility of bypassing certain automated responses.
Concerns were raised about the repetitive use of conversational filler phrases in audio overviews and the impact on user experience.

Links mentioned:

litdb + Jupyter lab: litdb is now compatible with Jupyter lab. This video will show some of the new capabilities. Now you can save your literature searches and analyses in a note...
AI Tools in Society: Impacts on Cognitive Offloading and the Future of Critical Thinking: The proliferation of artificial intelligence (AI) tools has transformed numerous aspects of daily life, yet its impact on critical thinking remains underexplored. This study investigates the relations...
litdb + Jupyter lab: litdb is now compatible with Jupyter lab. This video will show some of the new capabilities. Now you can save your literature searches and analyses in a note...

Perplexity AI ▷ #general (68 messages🔥🔥):

Perplexity Pro Issues, Model Performance, Image Generation Problems, Subscription Activation Challenges, Student Discounts Inquiry 

Users Encounter Problems with Perplexity Pro: Several members reported issues with the Perplexity Pro service, particularly with model settings like o1 not being recognized, even after troubleshooting suggestions.
A user mentioned that their VPN might be causing issues, while another confirmed that models like SONNET and 4o were functioning correctly.

Image Generation Censorship Complaints: Discussion arose about the inconsistencies in image generation moderation, with one user noting moderation failures for specific requests, such as generating a pink horse head.
They highlighted a troubling instance where inappropriate content was generated while expected outputs were denied.

Challenges Activating Promotional Codes: One member expressed difficulties in activating a promo code received from T-Mobile for free Perplexity Pro access, seeking guidance from the community.
Another user recommended contacting support@perplexity.ai for assistance with activation issues.

User Inquiry About Student Discounts: A user inquired whether any reduced rates exist for students using Perplexity, particularly in the EU.
They expressed hope for receiving supportive news regarding discounts for students.

Concerns about Response Quality: Users noted a significant reduction in response length and detail from the models after recent updates, making interactions feel uninformative.
Many expressed disappointment, particularly with the decreased depth of responses they had previously appreciated.

Perplexity AI ▷ #sharing (5 messages):

Starship 7 incident, China's Space Solar Array, Apple's USA-Made iPhone Chips, OpenAI's Economic Blueprint, Time Travel Discussion 

Starship 7 lost in flight: Multiple users shared links discussing the Starship 7 incident, particularly regarding its loss during flight. Detailed investigation and analysis can be found here.
China's ambitious Space Solar Array plan: A video was posted discussing China's plans for a space solar array, highlighting its potential impact on energy resources. More on this can be viewed in the YouTube video titled 'YouTube'.
Apple's first USA-made iPhone chips: There was mention of Apple producing its first USA-made chips for iPhones, emphasizing a shift in manufacturing. Details on this development are discussed further in the context of economic implications.
OpenAI's Economic Blueprint: A link was shared discussing OpenAI's economic blueprint, which outlines strategies for future growth and sustainability. The implications of these strategies are significant for the tech industry at large.
Time Travel – Is it Possible?: A link discussing the concept of time travel and its feasibility sparked curiosity among users. Insights into the scientific possibilities behind this concept can be explored here.

Link mentioned: YouTube: no description found

Perplexity AI ▷ #pplx-api (5 messages):

Sonar and Sonar-Pro models, Custom stop parameters error, crewAI model troubleshooting 

Potential Changes with Sonar Models: A member noticed the appearance of Sonar and Sonar-Pro models in labs, sparking speculation about upcoming changes in the API models.
Are the API models about to change again? hints at ongoing developments in the model landscape.

CrewAI Users Encounter Custom Stop Parameters Error: A user reported issues with the custom stop parameters error while trying to get pplx to work with a basic crew, leading to some troubleshooting discussions.
Another user advised checking the model documentation, linked here, as part of the solution.

Tried All Models But Issues Persist: The user expressed frustration after trying all three available models but still encountering the same error, saying thanks, but I have tried all three of the models.
They indicated that the issue is likely not specific to Perplexity and contemplated continuing the conversation in the crewAI community.

Finding a Quick Fix for Stop Parameters: They mentioned discovering a monkey fix that bypasses the stop parameters in litellm before making API calls, offering a temporary workaround.
This solution highlights user ingenuity in addressing technical challenges while awaiting proper fixes.

Link mentioned: no title found: no description found

Stability.ai (Stable Diffusion) ▷ #general-chat (76 messages🔥🔥):

David Lynch in the Lodge, Using Stable Diffusion for business, Image generation and controlnet issues, Training LoRA with personal photos, Switching between Stable Diffusion webUIs 

David Lynch's presence in the Lodge sparks dark humor: Members reacted to David Lynch's presence in the Lodge, with discussions reflecting on unexpected moral figures in art.
Comments included dark jokes about Lynch's artistic reputation, highlighting the quirky nature of the conversation.

Exploring Stable Diffusion for commercial use: Discussions revolved around using Stable Diffusion for generating images in a print-on-demand store, emphasizing the need for upscaling images for print resolution.
Members shared insights on the licensing issues around commercial usage and clarified that outputs are generally fair game unless using the model in a specific application.

Common struggles with image generation and controlnet: A user raised a question about generating images with Stable Diffusion and how to effectively use image inputs, confirming the need for prompts even in image-to-image tasks.
Different methods, like using lineart or ControlNet, were suggested, with a focus on extracting usable data through various techniques.

Challenges in training a LoRA model: A user encountered issues while creating a LoRA model for their child’s photos, questioning whether to crop images and how to handle resolutions.
Advice was given on image processing and potential model changes, emphasizing the importance of dataset quality in training.

Switching between SD Forge and Automatic1111: A user considering switching to Automatic1111 encountered cartoonish output issues linked to a specific Huggingface model, indicating serious functional discrepancies.
It was noted that prompt styles are stored in styles.csv, facilitating the transfer and management of saved prompts when moving between webUIs.

Link mentioned: GitHub - lllyasviel/stable-diffusion-webui-forge: Contribute to lllyasviel/stable-diffusion-webui-forge development by creating an account on GitHub.

Nomic.ai (GPT4All) ▷ #announcements (1 messages):

Nomic Embed Vision, Apache 2.0 License, Multimodal tasks, Open weights and code 

Nomic Embed Vision adopts Apache 2.0 License: Nomic Embed Vision is now available under an Apache 2.0 License, providing developers with flexibility and access.
This transition allows for a high quality, unified embedding space for image, text, and multimodal tasks, significantly enhancing usability.

Nomic Embed Vision surpasses competitors: The platform reportedly outperforms both OpenAI CLIP and text-embedding-3-small, making it a competitive choice for embedding capabilities.
High-performance benchmarks underscore the potential for advanced applications across various domains.

Access to Open Weights and Code: Nomic Embed Vision also provides open weights and code, enabling communities to build upon the existing architecture.
This move represents a commitment to transparency and collaborative development, fostering innovation in the field.

Link mentioned: Tweet from Nomic AI (@nomic_ai): Nomic Embed Vision is now under an Apache 2.0 License.- High quality, unified embedding space for image, text, and multimodal tasks.- Outperforms both OpenAI CLIP and text-embedding-3-small  - Open we...

Nomic.ai (GPT4All) ▷ #general (66 messages🔥🔥):

Ethics in AI, Model Recommendations, Model Performance, Custom URL Schemes, Template for Qwen2.5-1.5B 

Debate Over Ethical Guardrails: Discussion highlighted concerns about whose ethics govern "ethical guardrails" in AI, with differing opinions on Western and Chinese standards.
One user emphasized that if guardrails overly restrict exploration beyond benign topics, they undermine the purpose of AI.

Model Suggestions and Performance: Recommendations included using LocalLlama for general model inquiries and specific models like DavidAU's for writing aid, with emphasis on performance tradeoffs for 8GB VRAM.
Users discussed using quantization techniques to optimize model speed and efficiency, with varying results reported across different setups.

Creating Custom Links for Applications: A user sought advice on creating a custom URL scheme to open different programs, like Emacs, based on the link type, e.g., hyperscope://.
There was a suggestion that embedding .md or .html files could facilitate invoking specific applications directly, simplifying access to certain knowledge pieces.

Challenges with Model Templates: A user faced issues with parsing templates in Qwen2.5-1.5B, receiving errors when utilizing certain formats, while others shared successful experiences with ChatML templates.
Frustration was expressed over template failures, especially when working with LocalDocs, and the necessity for proper settings adjustments was underscored.

Performance Trade-offs with Older Hardware: One user humorously noted the limitations of an old NVIDIA Quadro NVS300 GPU for running AI, while highlighting that even mobile devices would perform better.
Participants acknowledged that with very limited VRAM, running more sophisticated AI models would be impractical.

Links mentioned:

Home: GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use. - nomic-ai/gpt4all
mradermacher/Hel-v2.5-8b-DARK-FICTION-i1-GGUF at main: no description found
Quantization: no description found
EM German Mistral's default prompt template fails to parse · Issue #3362 · nomic-ai/gpt4all: Bug Report "As soon as you ask anything, the aforementioned error message appears, and nothing else is possible. When I switch to a language model like GPT-4All Falcon, the queries work but are v...
DavidAU (David Belton): no description found

GPU MODE ▷ #general (8 messages🔥):

LeetGPU Platform, CUDA Learning Resources 

LeetGPU: Your Free CUDA Playground: A new platform, LeetGPU, has been launched as a free online playground for writing and executing CUDA code without the need for sign-up or GPU access.
Feedback is encouraged as the developers look to improve the user experience.

Guide to Starting with CUDA: A user inquired about starting guides for CUDA, leading to a recommendation for the book CUDA by Example: An Introduction to General-Purpose GPU Programming.
Although somewhat outdated, it's noted that this resource covers the basics effectively and is supplemented with various downloadable materials and errata.

Links mentioned:

LeetGPU: no description found
CUDA By Example: no description found

GPU MODE ▷ #triton (6 messages):

Triton kernel optimizations, Warp specialization, Fusion of Triton kernels 

Triton Stage1_v2 Buffer Size Optimization: Improvements in stage1_v2 show that vectorization with the new buffer size (aligned_num_tokens, aligned_num_experts) accesses DRAM faster than the original implementation.
The discussion hints at mixing the old and new approaches for a kernel that is expected to be faster.

Warp Specialization in Triton: Automatic Warp Specialization Optimization was highlighted for enhancing kernel performance by employing an asynchronous execution model to manage separate hardware units.
This optimization allows for better data communication and performance gains in kernel executions.

Barriers Suggested for Triton Kernel Fusion: There are plans to implement barriers based on data flow, which would help with the difficulty of fusing Triton kernels due to the lack of blockwise synchronization.
This suggestion emphasizes the need for improved synchronization mechanisms in Triton development.

Repo Changes Cause Excitement: A member expressed excitement that the warp specialization repo, once public as a mirror, is now part of the main Triton repository.
After encountering issues building the branch, they consider the change a lucky day given the updated accessibility.

Link mentioned: Automatic Warp Specialization Optimization by htyu · Pull Request #5622 · triton-lang/triton: Warp specialization enhances kernel performance by utilizing an asynchronous execution model, where different parts of the kernel are handled by separate hardware units. The data communication betw...

GPU MODE ▷ #torch (29 messages🔥):

Torch Profiler Issues, Custom Autograd Function Implementation, Double Backward in PyTorch, Resource Sharing for Optimizations, Intermediate Tensor Management in Backward Pass 

Torch Profiler Crashes Due to Memory Bugs: A member reported encountering a memory corruption bug in libkineto which causes the profiler to crash during memory timeline exports using torch's profiler.
Despite the issues for some versions, they noted it might work depending on the specific Python and Torch versions in use.

Implementing Custom Autograd Function with Double Backward: A member is attempting to create a custom torch.autograd.Function that can perform addbmm and Softplus activation, and also encounters issues with implementation for double backward.
Another user suggested to have a separate autograd function for the backward pass, illustrating the method using chained backward in .backward().

Challenges with Double Backward Logic in Custom Ops: While discussing the need for custom double backward to optimize with triton, a member noted that PyTorch's autograd engine is currently slow due to independent CUDA kernel launches for each operation.
There was an acknowledgment that torch.compile() does not currently support double backward, impacting optimization efforts.

Resource Sharing for Learning Optimizations: Inquiry arose over resources for learning optimizations in PyTorch, leading to discussions about useful documentation and YouTube videos related to Python's runtime modifications.
One user emphasized finding information on PyTorch's docs but did not have a consolidated list of resources used for their writings.

Managing Intermediate Tensors During Backward Pass: A member raised a question about performing a two-step backward pass while managing intermediate tensors to reduce memory use by manually deleting nodes from the computation graph.
The discussion led to curiosity regarding methods to execute this efficiently without reinvoking the full backward computation.

Link mentioned: torch.func.grad — PyTorch 2.5 documentation: no description found

GPU MODE ▷ #algorithms (1 messages):

Custom torch.autograd.Function, addbmm with Softplus activation, Double backward implementation, PyTorch ops and Triton ops, Mathematical derivation 

Implementing Custom autograd Function: A member is attempting to implement a custom torch.autograd.Function that combines addbmm and Softplus activation, aimed at leveraging PyTorch ops initially and later transitioning to Triton ops.
They are currently facing challenges, particularly with the implementation of double backward, and have provided a code file for review.

Seeking Help with Double Backward Mathematics: The member requested guidance on how to correctly code the double backward functionality within their custom function, expressing uncertainty about the necessary mathematical derivation.
They noted finding limited resources, prompting further inquiries within the community for shared experiences or useful references.

GPU MODE ▷ #off-topic (6 messages):

LLM Agents, Business Logic vs LLM, Automation in Programming 

Concerns Over LLM Agents Dominating Logic: A member expressed skepticism about using LLMs to replace business logic, stating it could lead to disaster in sensitive applications such as aviation.
A lot of 'agent' buzzword being thrown around, adding that logic constraints are necessary.

Flaws in LLMs Acknowledge: Another member agreed that while the replacement of engineers by LLMs is an interesting thought, the flaws inherent in LLM applications are significant.
The concern is that current logic issues might not be solved merely by automation.

Future of LLMs and Engineering Workflows: Discussion hinted at a future where LLMs could use 'chain-of-thought' methods to produce models and code, integrating with SMT solvers and full provers to enhance engineering workflows.
However, it was noted that this would be computationally expensive and still might not address fundamental logic errors.

Reduction of Human Engineers by Automation: A member speculated on LLM's potential to reduce the need for human engineers in programming tasks, especially with structured languages like Python.
This process might enhance quality by decreasing clear issues yet doesn't eliminate logic errors.

GPU MODE ▷ #arm (1 messages):

Linux arm64 runners, Copilot chat enhancements 

Linux arm64 runners launched: The team announced the release of Linux arm64 hosted runners, which are now available for free in public repositories as a public preview. This update can enhance the CI/CD pipeline for projects utilizing arm64 architecture.
More details about this release can be found in the GitHub changelog.

Copilot now explains failed Actions jobs: Users can now utilize the 'Explain Error' feature from the PR mergebox or the Actions Job Page to ask Copilot about the reasons behind job failures. This enables a more interactive debugging process.
Screenshots demonstrating this feature can be found in the release documentation showing the interface in both contexts.

Link mentioned: Linux arm64 hosted runners now available for free in public repositories (Public Preview) · GitHub Changelog: Linux arm64 hosted runners now available for free in public repositories (Public Preview)

GPU MODE ▷ #liger-kernel (1 messages):
0x000ff4: is there any active topic that I can contribute

GPU MODE ▷ #thunderkittens (7 messages):

Hardware requirements for development, CUDA coding without GPU, Ampere architecture necessity, Ideal GPUs for tensor cores, Apple M chip compatibility 

Hardware requirements for GPU development discussed: A user inquired about hardware requirements for development, specifically querying the need for an Ampere architecture GPU.
Responses highlighted that while access is beneficial, one can still code for the CUDA runtime API without a physical GPU at leetgpu.com.

Focus on Tensor Cores Disclosed: It was clarified that thunderkittens focuses primarily on tensor cores, indicating a preference for GPUs starting from the Ampere series.
Recommendations included GPUs like A100, 4090, or H100 as ideal options for development.

Apple M Chips Compatibility: A member mentioned having a repository for those using Apple chips with a port for M chips.
This highlights the increasing interest in compatibility across diverse hardware ecosystems.

Link mentioned: LeetGPU: no description found

GPU MODE ▷ #arc-agi-2 (4 messages):

Familiarity with arc-agi-2 codebase, Self-improvement with grounding signal, Tree-sampling experiments with MCTS, Llama-3.2 performance on math tasks, AI feedback for model training 

Getting Acquainted with arc-agi-2: A user plans to spend next week familiarizing themselves with the arc-agi-2 codebase and seeks guidance on atomic tasks fitting into the roadmap.
They express willingness to invest in GPU resources for exploration and contributions.

Grounding Signal for Self-Improvement: A member discusses their focus on using grounding signal for self-improvement, starting with simple math tasks for Llama-3.2 3b.
They aim to produce faithful Chains of Thought (CoTs) which enhance model capabilities through potential token-level reinforcement learning.

Llama-3.2's Math Task Performance: Another user reports on the solve-rate of Llama-3.2 3b for various math sum tasks, noting that the simplest prompts yield the best results.
They emphasize the importance of training on successful completions to improve model performance and mention the issues with non-zero temperature sampling.

Planning Tree-Sampling Experiments: A member shares they are planning experiments involving simple tree-sampling, inspired by Monte Carlo Tree Search (MCTS) techniques.
This approach seeks to enhance their methodology for problem-solving within the AI domain.

Leveraging AI Feedback for Output Correction: The conversation highlights the strategy of utilizing AI feedback to assess and correct model outputs effectively.
This method aims to mitigate the occurrence of junk outputs while increasing reliability in model responses.

tinygrad (George Hotz) ▷ #general (10 messages🔥):

Flash Attention in Tinygrad, Memory Control Challenges, GPU Out-Of-Memory Errors, Nested-Loop Representation 

Flash Attention efforts fall short: After spending eight hours trying to incorporate Flash Attention into Tinygrad, frustration mounts as progress remains elusive.
“I have not gotten a successful run... so who knows,” summarizes the struggle.

Explicit Loops and Memory Control limitations: The core issue identified in the attempt is that Tinygrad struggles with explicit loops and memory control, which Flash Attention heavily relies on.
This leads to doubts about whether the implementation can be effectively executed within the framework.

Desperate attempts at tensor dimension representation: In an inventive attempt, there were efforts to represent the nested-loop form of Flash Attention as tensor dimensions, potentially reducing kernel creation.
However, despite these efforts, “I have successfully gotten ... a GPU OOM” reflects the challenges faced.

Small victories amid struggles: Despite ongoing challenges, a minor victory was achieved by computing at least one step of something resembling stable diffusion using a massive 25GB of GPU RAM.
“It's a small victory but we take those,” encapsulates the hopeful perseverance.

tinygrad (George Hotz) ▷ #learn-tinygrad (39 messages🔥):

Operator (Un)Fusion, Flash Attention Challenges, Tinygrad JIT Optimization, Adding FP8 Support, Windows Support in Tinygrad 

Operator (Un)Fusion Insights: A member shared a short note on operator (un)fusion, linking to a GitHub tutorial for those interested.
This could serve as a resource for understanding implementation details and nuances in tinygrad.

Difficulties Implementing Flash Attention: Members discussed the challenges of getting flash attention to work, specifically noting the need for a single kernel softmax and scheduler changes.
One member decided to first profile stable diffusion instead of continuing down the current path without understanding the existing code's performance.

Optimizing Tinygrad JIT for Variable Batch Sizes: A user inquired about handling JIT while maintaining speed with variable batch sizes and whether to separate JIT for train and test phases.
Suggestions included using .realize() effectively to manage computational graphs and experimenting with padding to maintain input consistency.

Incremental Development of FP8 Support: Discussion centered around adding FP8 (floating-point 8-bit) support in tinygrad without breaking existing tests, with suggestions for feature flags.
Identifying breaking lines in code was recommended as a strategy to gradually integrate this new feature.

Windows Support Confusion in Tinygrad: Clarification was sought on Windows support in tinygrad, with a member questioning how tests could be run if Windows was not supported.
The creator confirmed that while there are minor issues, they successfully worked on Windows and suggested addressing issues related to mmap constants.

Links mentioned:

tinygrad-notes/20250117_fusion.md at main · mesozoic-egg/tinygrad-notes: Tutorials on tinygrad. Contribute to mesozoic-egg/tinygrad-notes development by creating an account on GitHub.
tinygrad/extra/models/rnnt.py at master · tinygrad/tinygrad: You like pytorch? You like micrograd? You love tinygrad! ❤️  - tinygrad/tinygrad
tinygrad 0.7.0 · tinygrad/tinygrad · Discussion #1697: Bigger again at 4311 lines :( But, tons of new features this time! Just over 500 commits since 0.6.0. Release Highlights Windows support has been dropped to focus on Linux and Mac OS. Some function...

Yannick Kilcher ▷ #general (17 messages🔥):

FORTRAN resurgence, CUDA alternatives, Triton vs CUDA, Complex loss functions, V JEPA paper issues 

FORTRAN makes an unexpected comeback: My god, we've circled back to FORTRAN showcases the surprising return of an older programming language.
This reflects a broader trend in adapting classic languages amid evolving tech landscapes.

The hunt for a user-friendly CUDA alternative: A conversation highlighted frustrations with CUDA's complexity, noting that soon LLMs might automate coding entirely.
There's speculation that adept compiler developers will become instrumental in creating next-gen LLMs.

Comparing Triton and CUDA functionalities: Triton stands out for its Python compatibility, making it easier to optimize compared to CUDA's C++ roots, which some argue offer no real benefits.
Despite Triton's advantages, ChatGPT isn't as good at it—suggesting it's more effective depending on user cases and prompting strategies.

Inquiries about complex loss functions: A user inquired about the most complex loss function they've encountered, revealing a shared curiosity in advanced AI metrics.
This topic encourages exploration into innovative loss function designs within the AI community.

Clarifications needed on the V JEPA paper: Concerns were raised regarding the V JEPA paper, particularly about the attentive layer's functionality for downstream tasks.
The discussion focused on understanding tensor interpretations and softmax operations related to embeddings.

Yannick Kilcher ▷ #paper-discussion (22 messages🔥):

MiniMax paper discussion, Model training results, Active Inference Insights, Communication non-verbal cues, Machine learning resources 

Exploring the MiniMax Paper: Members discussed the upcoming examination of the MiniMax-01 paper, which unifies various attention mechanisms like MHA and GQA.
One member found the integrated math manageable and the code released by the authors easy to understand.

Positive Results from Model Training: A member shared that they trained a 100M parameter flow matching model on a 3090 TI and found the results indistinguishable from MHA attention without utilizing rope.
Another member expressed admiration for their consistent productivity with model training using the 3090.

Non-Verbal Communication Insights: A user referenced a paper on the significant role of non-verbal communication in human interaction, noting that it contributes up to 60% of total communication.
The discussion emphasized understanding facial expressions and gestures for effective communication.

Diving into Active Inference: A member highlighted a YouTube video featuring discussant Karl Friston, which focuses on active inference principles.
The video presents insights into free energy, time, and consciousness, contributing to the foundational understanding of active inference.

Resource for Machine Learning: A member shared a GitHub guide for training tiny models, relevant for those with limited hardware capabilities.
The guide offers practical strategies for optimizing model training on less powerful setups.

Links mentioned:

MiniMax-01: Scaling Foundation Models with Lightning Attention: We introduce MiniMax-01 series, including MiniMax-Text-01 and MiniMax-VL-01, which are comparable to top-tier models while offering superior capabilities in processing longer contexts. The core lies i...
Challenge.Gov: Challenge.Gov is the official GSA government website supporting prize challenges and prize competitions that are sponsored by the US federal government.  Here federal agencies provide prize awards to ...
ml-engineering/debug/make-tiny-models-tokenizers-datasets.md at master · stas00/ml-engineering: Machine Learning Engineering Open Book. Contribute to stas00/ml-engineering development by creating an account on GitHub.
Karl Friston ~ Active Inference Insights 001 ~ Free Energy, Time, Consciousness: In this first episode of Active Inference Insights, Darius Parvizi-Wayne sits down with the chief architect of active inference, Karl Friston. Professor Fris...
Study on emotion recognition bias in different regional groups - Scientific Reports: no description found

Yannick Kilcher ▷ #ml-news (7 messages):

3090 Memory Mods, CaPa Paper Release, Mesh Generation Technologies 

Questions about 3090 Memory Mods: A member raised curiosity about the possibility of modding graphics cards for memory upgrades, particularly whether this applies only to 3090s.
Another member expressed a similar interest, noting their ownership of four 3090s and a desire to learn more about memory modifications.

CaPa's Efficient 4K Textured Mesh Generation: A new paper on CaPa details a method for generating 4K textured meshes in under 30 seconds, aimed at applications in gaming and VR/AR.
Although recognized as a notable advancement, one member mentioned it isn't open source, thus not fitting the criteria for a paper suggestion.

Comparison with Trellis: A member suggested that CaPa appears to outperform Trellis in terms of mesh generation capabilities.
There's a general excitement regarding the progress being made in the field of mesh generation technologies.

Link mentioned: CaPa: Carve-n-Paint Synthesis for Efficient 4K Textured Mesh Generation: no description found

OpenAI ▷ #ai-discussions (20 messages🔥):

Google Research TITANS, RunwayML content moderation, Master AI agent for LLM logs, API access limitations, Privacy concerns with OpenAI tokens 

Google Research reveals TITANS: A member shared a link to the YouTube video titled "Google Research Unveils 'Transformers 2.0' aka TITANS", discussing a new model that mimics human memory by using two smaller dynamic sub-models.
Another member noted that while this architecture helps with longer sequences, it isn't a fully continuous learning system yet, implying there's room for improvement.

RunwayML flags moderation issue: A user shared a humorous incident where RunwayML flagged a video input for containing the phrase 'underwear drawer', prompting discussions about content moderation sensitivities.
Comments emerged about the irony of moderation policies in AI, highlighting the unexpected contexts in which terms can become problematic.

Master AI agent for analyzing LLM logs: A member proposed creating a 'master AI agent' to analyze a vast archive of conversations across various LLMs to identify patterns and spawn tailored sub-agents.
They sought feedback about others' experiences with similar projects, looking for insights into potential challenges of such an implementation.

Concerns over API access tiers: Discussion arose about the limitations of accessing advanced API models, particularly with O1 being restricted to high-tier users due to capacity constraints.
Members expressed frustration over the exclusivity of access and suggested that a waitlist might provide a fairer approach to managing demand.

Privacy issues with OpenAI tokens: A user lamented that they signed away their privacy in exchange for 1 million free O1 tokens per day, only to find that they aren't tier 5 and thus do not receive the tokens.
Another commented humorously on the situation, indicating relief at not having to give up privacy while acknowledging the seriousness of the concern.

Links mentioned:

Google Research Unveils "Transformers 2.0" aka TITANS: Have we finally cracked the code on how to give models "human-like" memory? Watch to find out!Join My Newsletter for Regular AI Updates 👇🏼https://forwardfu...
Google Reveals SURPRISING New AI Feature "TITANS": The latest AI News. Learn about LLMs, Gen AI and get ready for the rollout of AGI. Wes Roth covers the latest happenings in the world of OpenAI, Google, Anth...

OpenAI ▷ #gpt-4-discussions (7 messages):

Mind Journal Functionality, DALL·E Integration Issue, Version History Inaccuracy 

Mind Journal working properly again: After troubleshooting, lazypeon.zzz discovered that the DALL·E box was unchecked in the GPT Editor, which caused the issue.
Once checked, everything returned to normal operation, highlighting a simple fix akin to 'did you restart the computer'.

Suspected glitch causing checkbox reset: Member solbus mentioned a user shared their GPT had the DALL·E box unchecked, prompting a check of the settings.
Confirming the reset raised questions about what could be causing this glitch in the system.

Version history contains 'INVALID DATE': lazypeon.zzz noted that many old versions in the Version History are labeled as 'INVALID DATE', indicating a potential error.
This inaccuracy may confuse users looking to track changes in their version history.

OpenAI ▷ #prompt-engineering (6 messages):

Prompt Engineering Book, ChatGPT Jailbreaks, Community Moderation 

30 Days to Prompt Engineering: One member asked if they could learn prompt engineering and write a book in 30 days, referencing OpenAI documentation for guidance.
Another member affirmed it was possible, suggesting self-discovery techniques for prompting the AI and shared a link to resources to expand knowledge.

Concerns about ChatGPT Jailbreaks: A member expressed a desire for ChatGPT jailbreaks, prompting a response about the appropriateness of such discussions.
A reply noted that while exploring sensitive topics is possible without triggering filters, it's advised to be mindful in this community due to higher moderation levels.

Respect the Community Guidelines: Members emphasized that discussions about jailbreaks are not suitable for this platform and advised caution.
Encouraging authenticity in discussing sensitive topics, they mentioned that unexpected inquiries could lead to moderation actions.

OpenAI ▷ #api-discussions (6 messages):

Prompt Engineering, ChatGPT Jailbreaks, AI Moderation 

Can I Learn Prompt Engineering in 30 Days?: One member inquired whether it's possible to learn prompt engineering and write a book in 30 days, suggesting the use of OpenAI documentation.
Another member confirmed it's feasible, recommending self-discovery techniques for effective prompting.

Community Stance on ChatGPT Jailbreaks: A member expressed a longing for ChatGPT jailbreaks, but others indicated that this isn't the appropriate space to request or share such endeavors.
Discussion ensued around the sensitivity of the topic, highlighting that moderation is notably stricter here than on ChatGPT.

Exploring Sensitive Topics: One user mentioned that being mindful and authentic when discussing sensitive topics often helps avoid triggering moderation filters.
However, they cautioned against discussing potentially problematic subjects openly, given the community's high moderation standards.

LM Studio ▷ #general (39 messages🔥):

Molmo Vision Model Errors, Llama Model Loading Issues, Model Performance on Mac, MiniMax-01 Benchmarks, Image Handling Bugs 

Molmo Vision Model's trust_remote_code Issue: Users encountered errors with the Molmo vision model requiring the option trust_remote_code=True for execution, as it isn't supported in LM Studio.
Another member confirmed that LM Studio doesn't support MLX models that necessitate this setting.

Llama 3.2 Vision Issues: Users reported problems loading the Llama 3.2 vision model, with errors stating unknown architecture and incompatibility in the existing LM Studio builds.
It was clarified that 3.2 Vision models are incompatible with Windows/Linux LM Studio and function only in MLX on Mac.

Performance Snags on Mac Systems: Concerns were raised about slow performance with Phi-4 on a Mac with 16GB RAM, yielding only 0.05 tokens/sec and potentially attributed to resource limitations.
Another user noted strange initial performance but stated that the speed improved drastically after processing the first few tokens.

MiniMax-01 Model Lacks Impact: A user evaluated the MiniMax-01 model, comparing it to WizardLM-2 and noting its unimpressive performance in various tasks.
They reported minor issues with non-adherence to formatting, particularly with Chinese output, deeming the model mediocre overall.

Image Loading Issue with Vision Models: A user facing an issue with vision models pointed out that when loading a new image, responses refer back to the first picture.
They suggested reloading or clearing the chat as a workaround, indicating this might be a common problem across different models.

Links mentioned:

MiniMax-01: This OPENSOURCE Model HAS LONGEST 4M CONTEXT & BEATS OTHERS!: Visit OnDemand and Claim your $150 Credits for FREE : https://app.on-demand.io/auth/signup?refCode=AICODEKINGIn this video, I'll be telling you about MiniMax...
Sideload models - Advanced | LM Studio Docs: Use model files you've downloaded outside of LM Studio

Cohere ▷ #discussions (10 messages🔥):

User Introductions, Student Projects, Billing Support 

User Introductions Should Be Impactful: A member suggested that new users should provide more than just a simple 'hi', recommending they share their motivations and potential AI projects for a more engaging introduction.
Another member reinforced this idea, stating that this approach would lead to more meaningful conversations.

Final Year University Project on Gen AI: A member shared their thoughts on planning their final year university project, indicating an interest in pursuing something related to Generative AI.
This brings an academic angle to the ongoing discussions in the community, particularly among students exploring AI applications.

Billing Support Guidance Shared: One member provided information regarding billing inquiries, directing users to contact the support team at support@cohere.com.
This helps streamline communication for billing questions and ensures users receive proper assistance.

Cohere ▷ #questions (7 messages):

Prompt engineering for reranking, Chat history relevance, Use case exploration, Vercel hosting inquiry 

Chat history prompts for reranking: A user inquired about the best way to pass chat history of messages between user and assistant into the rerank prompt, questioning the proper time order for doing so.
Another user noted that while no explicit prompt engineering is required, more details lead to better semantic relevance, emphasizing the importance of result indices.

Exploring use cases for reranking: A member expressed interest in understanding the user's use case for the reranker, asking if they were seeking relevant chat messages from history.
This openness to learning highlighted a collaborative approach to improving understanding and functionality within the group.

Vercel hosting query: A member responded to an unrelated inquiry regarding Vercel, clarifying that Vercel is a hosting service and suggested the user should communicate directly with them.
This statement pointed out that the query may not be relevant to the current Cohere discussion, keeping the focus on pertinent topics.

Cohere ▷ #api-discussions (1 messages):

Command R models cost comparison, Default model timestamp confusion, Issues with 8-2024 version 

Cost Comparison of Command R Models: Members questioned whether the command R models with the 8-2024 timestamp cost the same as their earlier counterparts on the API.
This led to discussions on pricing transparency and the value of updates in newer models.

Default Model Doesn't Point to Newer Timestamp: It was noted that the default model, command-r, does not reference the newer 8-2024 timestamp, raising curiosity among users.
Members speculated whether this could be a marketing strategy or a potential oversight in the API's configuration.

Concerns Over 8-2024 Model Issues: A query was raised about whether anyone had noticed issues with the 8-2024 version of the command R models after its release.
Participants expressed that feedback on performance should be closely monitored as users begin to adopt this model.

Cohere ▷ #cmd-r-bot (11 messages🔥):

Free platforms for deep learning, Resources for beginners in deep learning, Cohere LLM University, Cohere Cookbooks, AWS Cloud platform for Cohere 

Cohere's recommended free platforms for deep learning: Cohere provides a variety of free resources for learning deep learning, including LLM University with expert-led courses and Cookbooks for practical tutorials.
New users can enjoy $75 free credits for the first three months and only pay-as-they-go.

Beginner-friendly resources offered by Cohere: For beginners, Cohere's LLM University offers expert-led courses alongside Cookbooks that include quickstart guides like 'Hello World! Meet Language AI'.
Additionally, users can access Cohere's language models via AWS Cloud for a fully managed experience.

Cohere ▷ #cohere-toolkit (1 messages):

Community Guidelines, Mod Role 

Community thrives under mod guidance: A member expressed gratitude to another for their note and highlighted that the mods run the place effectively.
Following mod guidelines is crucial for members to navigate the community successfully.

Reminder of mod importance: The conversation underscored the role of mods in maintaining order, reiterating that they are essential to the community's functioning.
Members are encouraged to adhere to the established rules for a better experience.

Modular (Mojo 🔥) ▷ #general (4 messages):

Modular GitHub Repository Migration, Community Package Showcase, Adding Mojo/MAX Projects to Awesome for Beginners, Concerns about Mojo Language Stability, Request for Mojo-Specific Projects List 

Modular GitHub repositories migrated: Modular's public GitHub repos have transitioned from ModularML to the new Modular organization, with automatic redirects in place.
Users are encouraged to report any unexpected issues related to this migration.

Showcase your MAX & Mojo projects!: A new page will soon launch on the Modular website to highlight community-contributed packages via Magic, inviting project submissions to the Magic community channel.
Interested contributors can find submission instructions here and must submit a pull request to add a rattler-build recipe.

Suggestions to include Mojo/MAX projects on GitHub: A member proposed adding Mojo and MAX projects to the 70k-star repository awesome-for-beginners to attract newer contributors to the community.
This would help in promoting beginner-friendly projects within the Modular ecosystem.

Risks associated with Mojo's rapid changes: Concerns were raised regarding the rapid pace of changes in the Mojo language, suggesting it might deter potential new users due to instability.
The member highlighted the importance of stabilizing the language features before broader advertising and community engagement.

Call for a Mojo-Specific Projects List: The need for a dedicated list of beginner-friendly Mojo projects was mentioned, emphasizing tasks in the standard library that are accessible to newcomers.
Examples given include hash tables, BTree maps, and CSV parsing, which could appeal to those looking to contribute.

Link mentioned: GitHub - MunGell/awesome-for-beginners: A list of awesome beginners-friendly projects.: A list of awesome beginners-friendly projects. Contribute to MunGell/awesome-for-beginners development by creating an account on GitHub.

Modular (Mojo 🔥) ▷ #mojo (18 messages🔥):

Mojo Parallelization Constraints, yyjson Data Structures, Type System and Language Improvements, Using Variant for Sum Type Support, Quantum Country Resource Feedback 

Mojo's parallelize limits with Python interactions: A user reported that using parallelize in Mojo fails at runtime when both num_work_items and num_workers exceed 1 while interacting with Python, but works with Mojo-only code.
The example provided indicates that this behavior specifically occurs within the start function of a structure leveraging the Foo class.

yyjson for Efficient JSON Handling: A user discussed examining the yyjson library, highlighting its use of immutable and mutable data structures for JSON documents.
The user noted yyjson's approach to managing larger documents, considering the efficiency of zero-copy designs in handling JSON data.

Planning for Future Language Improvements: Discussion emerged regarding potential improvements in the Mojo type system and language features that may affect API design, emphasizing the importance of well-thought-out plans.
Concerns were raised about the need to avoid reworking large codebases due to upcoming changes in the language and its standard library.

Using Variant as a Sum Type Stand-in: A user shared their perspective on using Variant in Mojo, suggesting it serves as a placeholder for future sum type support.
This reflects a broader hesitation in diving deep into optimizations while the language is still evolving.

Feedback on Quantum Country Resource: A user expressed gratitude for being introduced to quantum.country, labeling it as a phenomenal resource.
They also shared initial frustration about the lack of an e-reader friendly version, but later understood it was intentional.

Links mentioned:

yyjson: Data Structures: no description found
yoke - Rust: no description found
Introduction: The fastest JSON library in C

Modular (Mojo 🔥) ▷ #max (1 messages):

MAX's final form, Composable components, .NET platform comparison 

Contemplation on MAX's final form: A member pondered whether MAX's final form resembles Microsoft's .NET platform as a suite of composable components.
They speculated on C# or Mojo being at the core of this potential architecture.

Discussion on architecture similarities: The comparison raised questions about MS's .NET platform structure versus MAX's architecture, inviting technical analysis.
This discussion highlighted the value of composability in modern technology stacks.

Latent Space ▷ #ai-general-chat (22 messages🔥):

SWEBench Verification, Funding Rounds in AI, Agent Recipes, Cybersecurity Executive Order, OpenAI's webRTC API 

SWEBench Verification Achieved: Our CTO announced that their o1-based AI programming agent has achieved a state-of-the-art performance of 64.6% on SWEBench, marking it as the first fully o1-driven agent known.
A formal submission for verification is being prepared, indicating strides in performance metrics for AI coding tools.

Anysphere Secures $105M in Series B: Anysphere has raised $105 million in Series B funding to advance their mission of automating code, with significant backing from Thrive Capital and Andreessen Horowitz.
The funding aims to enhance their research on coding automation, serving millions of programmers as their editor of choice.

Launch of Agent Recipes: A new site called Agent Recipes has been launched to provide devs with code examples for agent workflows that can be easily integrated into their AI applications.
This initiative aims to become a premiere resource for learning about agents' implementation strategies in coding.

Biden's Cybersecurity Directive: President Joe Biden issued a comprehensive cybersecurity executive order aimed at enhancing the government's use of AI and protecting against foreign cyber threats.
The directive outlines strategies for strengthening digital infrastructure and implementing new identity measures for U.S. citizens.

Discussion on OpenAI's webRTC API: A user raised concerns about the challenges of implementing OpenAI's webRTC realtime API, noting it has mostly been showcased by OpenAI employees.
Calls for community support included requests for shared repositories or solutions to ease the implementation woes.

Links mentioned:

Series B and Automating Code | Cursor - The AI Code Editor: We've raised $105M to further our mission of automating code.
Thoughts On A Month With Devin – Answer.AI: Our impressions of Devin after giving it 20+ tasks.
10 AI Predictions For 2025: 2025 will be a huge year for the field of artificial intelligence.
A New Jam-Packed Biden Executive Order Tackles Cybersecurity, AI, and More: US president Joe Biden just issued a 40-page executive order that aims to bolster federal cybersecurity protections, directs government use of AI—and takes a swipe at Microsoft’s dominance.
Tweet from Shawn Lewis (@shawnup): My o1-based AI programming agent is now state of the art on SWE-Bench Verified! It resolves 64.6% of issues.This is the first fully o1-driven agent we know of. And we learned a ton building it.
Tweet from Sheel Mohnot (@pitdesi): Harvey, the AI for law firms, is raising another round from Sequoia ($300M at $3B).Last round was Series C in July, $100M at $1.5B led by GV.They were estimated to have $30M of revenue then, wonder wh...
Tweet from Hassan (@nutlope): Announcing Agent Recipes!A site to learn about agent/workflow recipes with code examples that you can easily copy & paste into your own AI apps.I'm gonna make this the go-to resource for devs to l...
What You MUST Know About AI Engineering in 2025 | Chip Huyen, Author of “AI Engineering”: What You MUST Know About AI Engineering in 2025 | Chip Huyen, Author of “AI Engineering”

LlamaIndex ▷ #blog (2 messages):

Women in AI RAG Hackathon, GraphRAG with LlamaIndex and Memgraph 

Women in AI RAG Hackathon Invites: Women in technology are invited to the Women in AI RAG Hackathon in Palo Alto, focusing on Retrieval-Augmented Generation (RAG) using the open-source vector database @zilliz_universe.
Participants will have the chance to connect with fellow women technologists and mentors throughout this all-day event.

GraphRAG Webinar Insights: The recent webinar covered how @memgraphdb and LlamaIndex collaborate to build agentic graph applications with emphasis on GraphRAG for enhanced context retrieval in generative AI workflows Watch here.
Key strategies for improving RAG pipelines through agentic approaches were discussed, enriching the toolkit for AI applications More here.

LlamaIndex ▷ #general (18 messages🔥):

Cached Augmented Generation (CAG), Azure AI with OpenAI, Embedding Model Configuration, Research on RAG Domain Influence, Tracking Prompts Across LLMs 

Discussion on Cached Augmented Generation (CAG): Members discussed the implementation of Cached Augmented Generation (CAG) with Gemini and LlamaIndex, emphasizing that it likely requires direct model access, such as PyTorch.
An example was shared illustrating the implementation at GitHub - CAG.

Challenges with Azure AI Integration: A member encountered issues using Azure AI with OpenAI, noting that their integration attempts were still directing requests to OpenAI instead of Azure's services.
Another suggested that it might be necessary to configure an embedding model alongside the LLM to resolve the issue.

Metadata Handling During Chunking: Inquiries arose about how chunking handles node metadata during embedding, confirming that the nodegetcontent.metadata.embed is indeed the relevant part.
It was clarified that users can modify the excluded_llm_metadata_keys and excluded_embed_metadata_keys to control what metadata is included during this process.

Need for Documentation Improvements: A member expressed frustration regarding the lack of clear documentation about changing models used for Azure AI, highlighting potential areas for improvement.
They voiced a desire for more explicit guidance within example pages to better facilitate the model selection process.

Query on Tools for Tracking Prompts Across LLMs: A member sought recommendations for any packages or tools available to track and compare prompts across multiple open-source LLMs.
No specific solutions were provided in the discussion.

Links mentioned:

CAG/kvcache.py at main · hhhuang/CAG: Cache-Augmented Generation: A Simple, Efficient Alternative to RAG - hhhuang/CAG
Ai - ML - qb: WhatsApp Group Invite
Quantum-qb: WhatsApp Group Invite

DSPy ▷ #general (5 messages):

DSPy v3 release, Stable Diffusion optimization, Chain-of-thought style iteration 

DSPy v3 release timeline: There will be no launch of DSPy v3 in Q1, as it represents a larger type of change with earlier releases planned before it.
The exact timing for v3 remains uncertain as discussions continue around the preparation for upcoming releases.

New project on Stable Diffusion: A new project aims to optimize Stable Diffusion prompts through a 'chain-of-thought' iteration style, showcasing a novel approach built with DSPy.
This initiative is highlighted in a tweet by Thorondor LLC, sharing excitement about the innovative strategy.

Link mentioned: Tweet from Thorondor LLC (@ThorondorLLC): New project! Optimize your stable diffusion prompts via a "chain-of-thought" style iteration - built with DSPy

DSPy ▷ #examples (3 messages):

dspy ReAct usage, addition function error, LLama model issues 

Error with dspy ReAct on addition function: A user encountered an error stating that the tool addition is not designed to calculate sum of two numbers and requires additional arguments that are unknown.
They mentioned using the LLama model hosted with LM-Studio, seeking help from the community to resolve this issue.

Clarification on error message: Another member asked for the entire error message to diagnose the issue, suspecting that redefining addition in the context might be overriding the original function.
This suggests potential conflicts in the code that could be preventing proper execution.

Axolotl AI ▷ #general (5 messages):

ChatML, Llama3, Torchtune, ShareGPT Dataset, Migration from ShareGPT 

ChatML vs Llama3 Discussion: A member inquired whether to use ChatML or Llama3, implying an ongoing debate on optimal model usage.
Another member acknowledged the inquiry with a casual 'duh', indicating familiarity with the discussion.

ShareGPT Dataset Clarification: A member questioned if there were any issues with using the ShareGPT dataset, suggesting potential concerns.
It was clarified by another that there is no problem, emphasizing the existence of a configuration to easily map keys.

Migration from ShareGPT Explained: A discussion highlighted that there is an existing migration explanation from ShareGPT, which can be found in the documentation.
This indicates ongoing efforts to ensure smooth transitions for users switching between datasets.

Torchtune Adjustments: One member mentioned that Torchtune requires significant modifications at the moment.
This statement hints at evolving requirements in the tool's functionalities that may impact user implementations.

OpenInterpreter ▷ #general (1 messages):

Screenshot Analysis, Image Insights 

Discussion on Screenshot Analysis: A member shared a screenshot for analysis but provided no context or details about its content.
There were no subsequent comments or insights regarding the image presented.

Lack of Engagement on Image Insights: Despite the shared screenshot, members did not engage in discussion or ask clarifying questions.
The silence following the image indicates a possible missed opportunity for community analysis.

AI21 Labs (Jamba) ▷ #general-chat (1 messages):

`` 

User Curiosity on Discovery Timing: A user expressed curiosity about how someone discovered a particular feature, wondering if they were previously aware and just started testing it or if they hadn't played with it at all.
Curiosity around discovery processes can lead to interesting discussions about user engagement and exploration.

Testing Roadblocks in Exploration: A user initiated a discussion regarding potential roadblocks in testing features that they may not have tried previously.
Understanding these delays or hesitations can help clarify user experiences and improve feature accessibility.

Don't miss what's next. Subscribe to AI News (MOVED TO news.smol.ai!):