[AINews] not much happened today
This is AI News! an MVP of a service that goes thru all AI discords/Twitters/reddits and summarizes what people are talking about, so that you can keep up without the fatigue. Signing up here opts you in to the real thing when we launch it 🔜
Secure endpoints are all you need.
AI News for 1/28/2025-1/29/2025. We checked 7 subreddits, 433 Twitters and 34 Discords (225 channels, and 4890 messages) for you. Estimated reading time saved (at 200wpm): 549 minutes. You can now tag @smol_ai for AINews discussions!
Rumors of Grok 3 and o3-mini continue to swirl.
Table of Contents
- AI Twitter Recap
- AI Reddit Recap
- AI Discord Recap
- PART 1: High level Discord summaries
- Unsloth AI (Daniel Han) Discord
- OpenAI Discord
- LM Studio Discord
- aider (Paul Gauthier) Discord
- Perplexity AI Discord
- Nous Research AI Discord
- Codeium (Windsurf) Discord
- OpenRouter (Alex Atallah) Discord
- Interconnects (Nathan Lambert) Discord
- Cursor IDE Discord
- Yannick Kilcher Discord
- Eleuther Discord
- GPU MODE Discord
- Stability.ai (Stable Diffusion) Discord
- Stackblitz (Bolt.new) Discord
- MCP (Glama) Discord
- Nomic.ai (GPT4All) Discord
- Notebook LM Discord Discord
- Latent Space Discord
- Cohere Discord
- LLM Agents (Berkeley MOOC) Discord
- Modular (Mojo 🔥) Discord
- Torchtune Discord
- Axolotl AI Discord
- LlamaIndex Discord
- MLOps @Chipro Discord
- DSPy Discord
- OpenInterpreter Discord
- tinygrad (George Hotz) Discord
- LAION Discord
- PART 2: Detailed by-Channel summaries and links
- Unsloth AI (Daniel Han) ▷ #general (584 messages🔥🔥🔥):
- Unsloth AI (Daniel Han) ▷ #off-topic (24 messages🔥):
- Unsloth AI (Daniel Han) ▷ #help (131 messages🔥🔥):
- Unsloth AI (Daniel Han) ▷ #research (3 messages):
- OpenAI ▷ #ai-discussions (404 messages🔥🔥🔥):
- OpenAI ▷ #gpt-4-discussions (30 messages🔥):
- OpenAI ▷ #prompt-engineering (1 messages):
- OpenAI ▷ #api-discussions (1 messages):
- LM Studio ▷ #general (247 messages🔥🔥):
- LM Studio ▷ #hardware-discussion (152 messages🔥🔥):
- aider (Paul Gauthier) ▷ #general (329 messages🔥🔥):
- aider (Paul Gauthier) ▷ #questions-and-tips (56 messages🔥🔥):
- aider (Paul Gauthier) ▷ #links (1 messages):
- Perplexity AI ▷ #announcements (2 messages):
- Perplexity AI ▷ #general (316 messages🔥🔥):
- Perplexity AI ▷ #sharing (13 messages🔥):
- Perplexity AI ▷ #pplx-api (10 messages🔥):
- Nous Research AI ▷ #general (298 messages🔥🔥):
- Nous Research AI ▷ #ask-about-llms (6 messages):
- Nous Research AI ▷ #research-papers (2 messages):
- Nous Research AI ▷ #interesting-links (1 messages):
- Nous Research AI ▷ #research-papers (2 messages):
- Codeium (Windsurf) ▷ #discussion (87 messages🔥🔥):
- Codeium (Windsurf) ▷ #windsurf (193 messages🔥🔥):
- OpenRouter (Alex Atallah) ▷ #announcements (1 messages):
- OpenRouter (Alex Atallah) ▷ #general (277 messages🔥🔥):
- Interconnects (Nathan Lambert) ▷ #news (53 messages🔥):
- Interconnects (Nathan Lambert) ▷ #ml-drama (24 messages🔥):
- Interconnects (Nathan Lambert) ▷ #random (41 messages🔥):
- Interconnects (Nathan Lambert) ▷ #memes (10 messages🔥):
- Interconnects (Nathan Lambert) ▷ #reads (35 messages🔥):
- Interconnects (Nathan Lambert) ▷ #posts (42 messages🔥):
- Interconnects (Nathan Lambert) ▷ #policy (24 messages🔥):
- Cursor IDE ▷ #general (219 messages🔥🔥):
- Yannick Kilcher ▷ #general (180 messages🔥🔥):
- Yannick Kilcher ▷ #paper-discussion (15 messages🔥):
- Yannick Kilcher ▷ #agents (3 messages):
- Yannick Kilcher ▷ #ml-news (14 messages🔥):
- Eleuther ▷ #general (60 messages🔥🔥):
- Eleuther ▷ #research (82 messages🔥🔥):
- Eleuther ▷ #interpretability-general (55 messages🔥🔥):
- Eleuther ▷ #gpt-neox-dev (5 messages):
- GPU MODE ▷ #general (12 messages🔥):
- GPU MODE ▷ #cuda (19 messages🔥):
- GPU MODE ▷ #torch (8 messages🔥):
- GPU MODE ▷ #announcements (1 messages):
- GPU MODE ▷ #cool-links (40 messages🔥):
- GPU MODE ▷ #beginner (10 messages🔥):
- GPU MODE ▷ #bitnet (1 messages):
- GPU MODE ▷ #self-promotion (1 messages):
- GPU MODE ▷ #thunderkittens (1 messages):
- GPU MODE ▷ #arc-agi-2 (89 messages🔥🔥):
- Stability.ai (Stable Diffusion) ▷ #general-chat (120 messages🔥🔥):
- Stackblitz (Bolt.new) ▷ #announcements (1 messages):
- Stackblitz (Bolt.new) ▷ #prompting (2 messages):
- Stackblitz (Bolt.new) ▷ #discussions (110 messages🔥🔥):
- MCP (Glama) ▷ #general (74 messages🔥🔥):
- MCP (Glama) ▷ #showcase (20 messages🔥):
- Nomic.ai (GPT4All) ▷ #general (82 messages🔥🔥):
- Notebook LM Discord ▷ #use-cases (11 messages🔥):
- Notebook LM Discord ▷ #general (70 messages🔥🔥):
- Latent Space ▷ #ai-general-chat (60 messages🔥🔥):
- Cohere ▷ #discussions (17 messages🔥):
- Cohere ▷ #api-discussions (4 messages):
- Cohere ▷ #cmd-r-bot (6 messages):
- Cohere ▷ #projects (12 messages🔥):
- LLM Agents (Berkeley MOOC) ▷ #mooc-questions (20 messages🔥):
- LLM Agents (Berkeley MOOC) ▷ #mooc-lecture-discussion (5 messages):
- LLM Agents (Berkeley MOOC) ▷ #mooc-readings-discussion (1 messages):
- Modular (Mojo 🔥) ▷ #general (6 messages):
- Modular (Mojo 🔥) ▷ #announcements (2 messages):
- Modular (Mojo 🔥) ▷ #mojo (10 messages🔥):
- Torchtune ▷ #general (2 messages):
- Torchtune ▷ #dev (13 messages🔥):
- Torchtune ▷ #papers (2 messages):
- Axolotl AI ▷ #general (8 messages🔥):
- LlamaIndex ▷ #blog (2 messages):
- LlamaIndex ▷ #general (5 messages):
- MLOps @Chipro ▷ #events (1 messages):
- MLOps @Chipro ▷ #general-ml (3 messages):
- DSPy ▷ #papers (1 messages):
- DSPy ▷ #general (1 messages):
- OpenInterpreter ▷ #general (2 messages):
- tinygrad (George Hotz) ▷ #learn-tinygrad (1 messages):
- LAION ▷ #general (1 messages):
AI Twitter Recap
all recaps done by Claude 3.5 Sonnet, best of 4 runs.
DeepSeek Developments and Performance
- DeepSeek-R1 and V3 Advancements: @arankomatsuzaki highlighted that DeepSeek-V3, distilled from DeepSeek-R1, was trained on an instruction-tuning dataset of 1.5M samples. Additionally, @alexandr_wang emphasized that DeepSeek models are setting records for the disclosed amount of post-training data for open-source models, including 600,000 reasoning data and 200,000 non-reasoning SFT data.
- Performance Benchmarks: @teknium1 noted that DeepSeek-R1 AI + Groq enables coding "at the speed of thought". Furthermore, @osanseviero pointed out that DeepSeek has been consistently shipping models like Coder V2 and Prover over the past year, demonstrating sustained model performance and innovation.
AI Model Training, Costs, and Hardware
- Training Costs and Infrastructure: @teortaxesTex questioned the $5.5M training cost claim for DeepSeek, suggesting that the actual costs involve eliminating token routing inefficiency and keeping communication volume down using pipelined training. Additionally, @arankomatsuzaki provided an estimate that the entirety of V3 pretraining is within the ballpark of $6M.
- Hardware Utilization: @giffmana discussed the competitive advantage of DeepSeek's GPU usage, while @MarkTenenholtz mentioned that an 8xH100 server could handle DeepSeek-R1, indicating the hardware scalability required for such models.
Open Source AI and Deployment
- Deployment Platforms: @ClementDelangue announced that DeepSeek-R1 is now available on-premise through a collaboration with Dell and Hugging Face, facilitating open-source deployment for enterprise users.
- Community and Contributions: @Yoshua_Bengio acknowledged the collaborative effort in producing the International AI Safety Report, while @Markchen90 engaged in discussions around AI risk assessments and model deployment strategies.
AI Safety, Risks, and Ethics
- Safety Reports and Risk Mitigation: @Yoshua_Bengio detailed the International AI Safety Report, categorizing risks into malicious use, malfunctions, and systemic risks. This includes concerns like AI-driven cyberattacks and environmental impacts.
- Ethical Considerations: @c_valenzuelab praised the Copyright Office’s stance on AI tools assisting human creativity, emphasizing that AI does not diminish copyright protection when used appropriately.
AI Industry Insights and Comparisons
- Market Reactions and Competitiveness: @ylecun criticized the market's unjustified reactions to DeepSeek, arguing that the performance benchmarks demonstrate DeepSeek's competitive edge. Moreover, @giffmana highlighted that DeepSeek’s reasoning capabilities surpass many open-source models, positioning it strongly against OpenAI.
- Investment and Economic Impact: @fchollet discussed the economic incentives driving AI development, while @scaling01 argued that using GPT-4o equates to donating money to OpenAI, reflecting on the cost dynamics within the AI industry.
Memes/Humor
- Light-Hearted Interactions: @ylecun and @gabrielpeyre engaged in humorous exchanges with reactions like "LOL" and 🤣🤣🤣, showcasing the lighter side of technical discussions within the AI community.
- Humorous AI Outputs: @fabianstelzer shared a playful AI-generated script for bouncing yellow balls, blending technical scripting with creative AI humor.
AI Reddit Recap
/r/LocalLlama Recap
Theme 1. Confusion over DeepSeek R1 Models and Distillations
- PSA: your 7B/14B/32B/70B "R1" is NOT DeepSeek. (Score: 1246, Comments: 357): The post clarifies that the 7B/14B/32B/70B "R1" models are not the actual DeepSeek models but rather finetunes of existing dense models like Qwen 2.5 and Llama 3.3. The true DeepSeek model is the full 671B version, and the author is frustrated with repeated explanations needed due to common misconceptions.
- The naming confusion around DeepSeek models is a major issue, with many users misled by Ollama's naming conventions. The distilled models are often perceived as the full R1 model due to misleading names like "DeepSeek-R1:70b", which do not clearly indicate they are smaller, fine-tuned versions of Qwen 2.5 and Llama 3.3.
- Discussions highlight misinformation prevalent on platforms like YouTube and TikTok, where creators often claim to run DeepSeek locally, leading to widespread misconceptions. Users express frustration over the repeated need to clarify that these are not the full 671B DeepSeek models, which require over 1TB of VRAM and are not feasible for home use.
- The technical distinction between distillation and fine-tuning is emphasized, with several comments explaining that the so-called "distillation" is actually just fine-tuning on R1's responses. The real R1 is a Mixture of Experts (MoE) model, significantly different from the dense models like Qwen 2.5 and Llama 3.3, which are being fine-tuned.
- good shit (Score: 289, Comments: 138): OpenAI accuses China's DeepSeek of using its models to train a competitor, raising concerns about intellectual property theft. White House AI advisor David Sacks highlights these issues, as depicted in a Financial Times article featuring logos of both companies.
- Many commenters criticize OpenAI for accusing DeepSeek of intellectual property theft, highlighting the irony given OpenAI's own use of public data for training. DeepSeek is seen as a "Robinhood" figure by some, and the accusation is perceived as a tactic to stifle competition by weaponizing the "China threat."
- There is skepticism about the enforceability of OpenAI's Terms of Service, with some suggesting that Terms of Service might not hold legal weight in certain jurisdictions, including potentially China. Others argue that DeepSeek paid for the tokens it used, thus not violating any agreements.
- The broader sentiment among commenters is a call for OpenAI to focus on improving their products rather than litigating, with some advocating for a boycott of "ClosedAI" products due to perceived greed and hypocrisy.
- 4D Chess by the DeepSeek CEO (Score: 478, Comments: 91): Liang Wenfeng, CEO of DeepSeek, argues that closed-source approaches, like those of OpenAI, provide only temporary competitive advantages. Instead, he emphasizes the importance of building a strong team and organizational culture to foster innovation as a sustainable competitive moat. Read more here.
- Discussions highlight the technical advantage of DeepSeek using PTX instead of CUDA, which many US engineers are not equipped to handle due to the entrenched use of Python and CUDA over the past decade. This choice gives DeepSeek a significant skill advantage, as PTX is more efficient at training time, and transitioning to it requires a substantial increase in skill level.
- DeepSeek's impact on the AI landscape is compared to the Unix open-source movement in the 90s, suggesting a potential shift in competitive dynamics. OpenAI and other US companies might face challenges in maintaining their competitive edge if they do not adapt to the efficiencies demonstrated by DeepSeek, which could result in a rapid and cheap erosion of their competitive moats.
- DeepSeek is recognized for its innovation in the financial sector, with discussions on its strategic shift to building foundational models rather than just applying ML to finance. This move is seen as a way to gain deeper control and understanding of the technology, highlighting the value of having machine learning expertise within a quant finance firm.
Theme 2. Speculation on US Ban of DeepSeek and Market Impact
- Will Deepseek soon be banned in the US? (Score: 1371, Comments: 863): The post speculates about a potential ban on DeepSeek in the US, as the White House examines its national security implications. The information comes from the InsidersHut account, raising concerns about the future availability of the DeepSeek AI platform in the country.
- Open Source and Accessibility: Many commenters highlight that DeepSeek is open source and its models, including the 670B parameter version, are available for download on platforms like Hugging Face. This makes it difficult to ban effectively since users can run these models locally or on private servers.
- Security and Competition Concerns: Discussions revolve around the perceived irony of banning an open-source AI due to national security threats, while other commenters suggest that the move is more about curbing competition from non-US entities. Some express skepticism over the security risks, questioning the practicality of banning something that can be run offline without sending data back to China.
- Criticism of US Policy: Many comments criticize the US's approach to handling foreign tech competition, likening it to protectionism and drawing parallels to past actions against Chinese companies like TikTok. There is a sentiment that banning DeepSeek contradicts the ideals of a free market and reflects a fear of being outcompeted by innovative foreign technologies.
- So much DeepSeek fear mongering (Score: 539, Comments: 234): The post criticizes the widespread fear-mongering surrounding DeepSeek, questioning the credibility of those speaking against it. It references a LinkedIn post that portrays DeepSeek as a potential cybersecurity threat, urging scrutiny over its strategic implications and transparency, which has garnered significant engagement with 3,058 reactions, 1,148 comments, and 433 reposts.
- The discussion highlights skepticism towards the fear-mongering about DeepSeek, with users comparing it to baseless claims like those made during the COVID vaccine debates. Critics argue that the fear is exaggerated and question the motivations behind such narratives, suggesting it's a tactic to manipulate perceptions or markets.
- Some commenters emphasize transparency and security concerns, noting that unlike proprietary models like OpenAI's, DeepSeek is open source, allowing anyone to inspect its code. Users point out that security risks could be mitigated by running models locally or using services with favorable privacy policies, thus questioning the consistency of the fear narrative.
- The conversation includes a mix of satire and serious critique, with users mocking the idea that DeepSeek poses a significant threat, while others raise legitimate concerns about data privacy and the geopolitical implications of using AI tools developed in different countries. This reflects a broader distrust of both corporate and governmental entities in managing AI technologies.
- Some evidence of DeepSeek being attacked by DDoS has been released! (Score: 322, Comments: 87): DeepSeek experienced a series of DDoS attacks in January, with distinct phases involving HTTP proxy attacks, SSDP and NTP reflection amplification, and application layer attacks. The attacks peaked on January 28 between 03:00-04:00 Beijing time, with evidence suggesting they targeted overseas service providers, particularly from U.S. IPs, many of which were VPN exits. DeepSeek quickly responded by switching their IP at 00:58 on January 28 to mitigate the attacks, aligning with their security announcements.
- Several commenters suggest that the DDoS attacks on DeepSeek may not have been attacks at all, but rather a result of overwhelming user interest and inadequate server infrastructure. AnhedoniaJack and PhoenixModBot emphasize that sudden spikes in legitimate traffic can mimic DDoS patterns, especially if infrastructure isn't prepared for high loads.
- Johnxreturn and mobiplayer discuss technical defenses against DDoS, mentioning WAF, OWASP vulnerabilities, and CDN gateways, while questioning the effectiveness of these measures against specific attacks like NTP amplification. Mobiplayer criticizes a misunderstanding of how NTP amplification attacks work, highlighting the technical inaccuracies in some explanations.
- Doubts about the evidence and origin of the attacks are prevalent, with users like TsaiAGw and YT_Brian questioning the reliability of the source attributing the attacks to the U.S. Agabeckov and PhoenixModBot call for more detailed technical data to substantiate the claims of a DDoS attack, suggesting that the perceived attacks might have been misinterpreted due to lack of proper analysis.
Theme 3. DeepSeek API Challenges Amidst DDoS Attacks
- Berkley AI research team claims to reproduce DeepSeek core technologies for $30 (Score: 286, Comments: 87): The University of California, Berkeley research team, led by Jiayi Pan, claims to have reproduced DeepSeek R1-Zero's core technologies for just $30, showcasing how advanced AI models can be implemented cost-effectively. The team used a small language model with 3 billion parameters to develop self-verification and search abilities via reinforcement learning, potentially challenging OpenAI's market position.
- OpenAI's Position and Technology: Some believe OpenAI is already aware of the techniques used by DeepSeek, and while the reproduction of these methods is impressive, OpenAI could potentially implement them with greater resources. The discussion highlights that OpenAI's models, like the o3 model, achieve high performance but at significant computational costs, indicating a potential for cost reduction in AI development.
- Reinforcement Learning and Open Source: The resurgence of reinforcement learning (RL) and open knowledge transfer is emphasized as a key benefit, with the availability of TinyZero's repo on GitHub being particularly noted. This approach allows for self-improvement and distillation of models, which can be applied to larger models like LLaMa 3.1 405B, potentially enhancing their capabilities and supporting the viability of open-source AI projects.
- Market Implications and Open Source Viability: The success of distillation approaches, as demonstrated by DeepSeek, presents a challenge to proprietary models by companies like OpenAI and Anthropic. The ability to create capable, customized models through open-source methods suggests a shift towards more viable open-source projects, impacting the competitive landscape and potentially necessitating changes in proprietary infrastructure strategies.
- DeepSeek API: Every Request Is A Timeout :( (Score: 246, Comments: 83): The post humorously criticizes the DeepSeek API for frequent timeouts, symbolized by a gravestone image marking its short-lived functionality in January 2025. The sarcastic tone highlights user frustrations with the API's unreliability.
- Users express concerns about DeepSeek's long-term sustainability due to its free services, with some experiencing 503 errors when accessing the platform. Openrouter offers alternative, albeit more expensive, API endpoints for the R1 671b model that function effectively.
- Discussion highlights parallels between DeepSeek's issues and past outages with GPT-4, attributing the problems to increased popularity and possible DDoS attacks. Some speculate that the Spring Festival in China might contribute to service disruptions.
- The competition between platforms is noted, with ChatGPT lifting typical limits on their basic pro plan in response to DeepSeek's issues, showcasing the benefits of competitive markets. Users also discuss the availability of open-source options and the ability to run smaller models independently.
Other AI Subreddit Recap
/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT
Theme 1. OpenAI's Allegation: DeepSeek Leveraged Their Model
- OpenAI says it has evidence China’s DeepSeek used its model to train competitor (Score: 589, Comments: 418): OpenAI claims that China's DeepSeek used its model to train a competing AI. Without further context or details provided in the post, the implications or evidence supporting this claim remain unspecified.
- Many commenters highlight the irony in OpenAI's complaint, pointing out that OpenAI itself used data from the internet, including potentially copyrighted material, to train their models. DeepSeek is accused of using OpenAI's models, but this mirrors how OpenAI initially built on existing technologies and datasets.
- DeepSeek reportedly used synthetic data, possibly generated by OpenAI models, sparking discussions on whether outputs from such models belong to the user or the model creator. This raises concerns about OpenAI's terms of service and whether they claim ownership over user-generated outputs, potentially spreading fear, uncertainty, and doubt (FUD).
- Some comments discuss the technical and economic aspects of AI training, such as electricity costs and GPU pricing on platforms like Runpod. The H100 GPU is mentioned with a power consumption of 0.7 kilowatt and a cost of $1.99 per GPU hour, highlighting the significant resources required for AI model training.
- Anduril's founder gives his take on DeepSeek (Score: 306, Comments: 179): Palmer Luckey, founder of Anduril, critiques the media's reaction to DeepSeek's $5 million valuation, suggesting it is exaggerated and influenced by a Chinese hedge fund with ulterior motives. He argues that media narratives are biased against American tech companies and highlights misinformation regarding investment in AI startups, as evidenced by his Twitter post with 1K retweets, 3K likes, and 2.5K shares, viewed 1.6 million times on January 28, 2025.
- Discussions highlight skepticism about the $5 million valuation of DeepSeek, with comments suggesting that the actual costs are much higher when considering factors like infrastructure and salaries. Some argue that the media and public are misled by oversimplified figures, while others suggest this narrative is used by US companies to justify losing ground to China.
- There is a significant critique of media bias, with some commenters arguing that media narratives unfairly target American tech companies or support political figures like Trump. Others counter that the media is not monolithic and may have varied biases, sometimes even favoring big tech or political figures for ratings.
- The conversation also touches on open-source contributions, with some acknowledging China's role in promoting open-source AI developments. Commenters appreciate the energy savings and performance improvements offered by these contributions, contrasting them with the lack of transparency from companies like OpenAI.
Theme 2. Qwen 2.5 Max vs GPT-4o: Price and Performance Clash
- Mr president the second chinese ai has hit the market (Score: 1600, Comments: 99): Alibaba has introduced a new AI platform that reportedly surpasses Deepseek, as announced in a tweet by "The Spectator Index." The tweet has garnered significant attention with 17.8K views as of January 29, 2025.
- The Qwen 2.5 Max model by Alibaba is noted for its high cost, being 3-4x more expensive than GPT-4o, with pricing at $10/M input tokens and $30/M output tokens, compared to Deepseek's significantly lower costs. However, it lacks a "thinking mode" and is not open source, which limits its accessibility and appeal.
- Users have mixed opinions on the performance of Alibaba's AI, with some praising its image and video generation capabilities, providing examples like a pink rubber duck video and a handshake video. Others criticize its reasoning abilities, stating it is not as advanced as Deepseek-v3.
- There is discussion about alternative AI models, with Hugging Face working on an open-source version of Deepseek's R1, called open-r1, aiming to offer more accessible and powerful AI solutions.
- "Sir, China just released another model" (Score: 514, Comments: 45): Qwen 2.5 Max, a new AI model from China, is now available for use via Alibaba Cloud, as noted in a tweet by Junyang Lin. The post humorously highlights the model's release and invites users to explore it through the provided link.
- Trust in Tech: There's skepticism about the trustworthiness of Chinese tech, but some users argue that Chinese tech is as reliable as American tech, questioning the integrity of companies like Google, OpenAI, and Meta.
- Performance Concerns: Users express doubt about new LLMs claiming to be on par with larger models, questioning their real-world task performance. A user shared a link to test Qwen 2.5 directly, noting its utility in tweaking Python code but emphasizing the need for fact-checking in complex scenarios.
- Service Availability: There was a reported DDOS attack on the service, affecting its availability, although it's unclear if the issue persisted beyond the initial report.
Theme 3. Gemini 2's Flash Thinking: Evolution in AI Speed
- While we got OpenAI vs Deepseek (Score: 2043, Comments: 80): Gemini 2's flash capabilities are highlighted in a humorous exchange where a virtual assistant responds to a query about the number of seconds in a year with a playful list of monthly dates. This showcases the assistant's ability to engage in light-hearted, conversational interactions while maintaining a modern and visually appealing interface.
- Google Assistant vs Gemini: Discussions clarify that Google Assistant and Gemini are distinct, with Gemini using the assistant for certain tasks. Some users criticize Google Assistant's intelligence, noting its limitations compared to more advanced AI systems like those in Google AI Studio.
- AI Studio vs Gemini App: Users highlight that Google AI Studio offers more powerful AI capabilities than the Gemini app, which is seen as less effective for advanced tasks. The AI Studio is praised for its free access and advanced features, while the Gemini app is considered suitable only for casual use.
- Gemini 2's Unique Features: Gemini 2 is noted for its "flash thinking" capability, which allows it to process large amounts of data, such as videos or books, quickly. However, users point out that these features require specific tools within AI Studio, not available in the main Gemini version.
AI Discord Recap
A summary of Summaries of Summaries by Gemini 2.0 Flash Exp (gemini-2.0-flash-exp)
Theme 1: DeepSeek R1 Model Mania: Performance, Problems, and Promise
- DeepSeek R1 gets squeezed!: Unsloth AI shrunk DeepSeek R1 1.58-bit to a svelte 131GB (from 720GB!) while still clocking 140 tokens/sec, as it turns out selective layer quantization is the secret sauce for this compression magic, and Magpie-Align's dataset has inspired CoT training experiments. While some members worry that reasoning could degrade without explicit training data, others want to scale up the dataset.
- DeepSeek vs. OpenAI Showdown: It's more than just a battle of models: The community is testing DeepSeek R1 against OpenAI models in coding and creative tasks, with early results showing DeepSeek shines on coherence, while also bumping up against content limitations in touchy areas. Meanwhile, a YouTube video claiming that DeepSeek exposes the tech oligarchy's multi-billion dollar scam is also circulating, raising questions about censorship.
- DeepSeek data leaks raise big red flags: A publicly exposed ClickHouse instance known as "DeepLeak" revealed secrets, chats, and data exfiltration avenues, making people realize API key leaks are a clear and present danger.
Theme 2: Model Deployment and Hardware Headaches
- Macs stumble in LM Studio loading: LM Studio users are hitting model loading failures on Mac machines, blaming minimum hardware specs, GPU memory constraints and also urging to fix it through frequent beta updates. The community is noting that the memory constraints can freeze everything and that gguf doc is essential for fixes, and there's an ongoing discussion about the trade-offs when using Qwen2.5 vs DeepSeek for local use.
- Memory bandwidth is king for local LLMs: Performance now hinges heavily on memory bandwidth, with Macs falling short compared to GPUs like A4000 or 3060, as one user joked “You can't outrun memory bandwidth, even with a Threadripper CPU,”
- DeepSeek has gone to Azure and GitHub: The model is now available on Azure AI Foundry and GitHub, making enterprise AI easier to access.
Theme 3: AI Tools, Frameworks, and Their Quirks
- Cursor struggles to keep it together: Recent Cursor IDE updates are causing chaos, breaking tab completion and misinterpreting markdown with users saying “Cursor no longer displays its markdown output correctly.” Meanwhile, users are bemoaning the Claude 3.5 limit lockdown, as it blocks usage after 50 requests.
- OpenRouter's DeepSeek Integration: While Chutes now offers a free endpoint for DeepSeek R1, users are encountering problems with DeepSeek v3's translation quality and also criticizing OpenRouter's 5% API fees, calling for better error handling.
- Windsurf flails as users want DeepSeek: Windsurf users are complaining about the missing DeepSeek R1 integration, with some even threatening to switch to Cursor for better tool calling. They are also criticizng Sonnet for its coding unreliability, citing a drop in prompt comprehension and demanding faster fixes, while also flagging Cascade issues.
Theme 4: Training Techniques and Emerging Models
- Mixture-of-Experts get a memory boost: The community stressed that memory size is crucial for MoE performance on CPU setups, while sharing optimization tips that HPC-like resource management outperforms standard configurations, and a new paper Autonomy-of-Experts (AoE) was introduced, letting modules decide if they should handle an input, potentially boosting efficiency.
- Min-P Sampling Method: The introduction of min-p sampling is being talked about in the community, adjusting the threshold based on model confidence and aiming to enhance text quality and diversity.
- Sparse autoencoders may be unreliable: A new paper revealed that sparse autoencoders (SAEs) share only 30% of their learned features across seeds, raising questions about feature stability and reliability for interpretability tasks.
Theme 5: AI Ethics, Data, and the Future
- Concerns rise about DeepSeek's data practices: Bloomberg and the Financial Times report that DeepSeek allegedly trained on OpenAI data, sparking a debate on data ethics with some dismissing it as a smear campaign by a nervous competitor
- GPTs get tricky with zero-width space characters: The community discovered using an invisible zero-width space (like
httpXs://
) to bypass unwanted link formatting in GPTs, while users also reported Custom GPTs often fail to reliably output all links, raising questions about user memory handling. - The future of AI may depend on Grok3 and O3-mini: Rumors suggest Grok3 and O3-mini will hit in January, inspiring hopes for next-level reasoning, while O3-mini promises to run at 4x the speed of O1-mini.
PART 1: High level Discord summaries
Unsloth AI (Daniel Han) Discord
- DeepSeek's Dashing Downsize: Unsloth AI integrated DeepSeek R1 1.58-bit with OpenWebUI, shrinking from 720GB to 131GB while sustaining ~140 tokens/sec on 160GB VRAM.
- Community members noted that selective layer quantization was key to this speedup, prompting further fine-tuning talks and referencing Magpie-Align's 250K CoT dataset.
- Crisp CoT Gains: Participants highlighted generating Chain-of-Thought samples with larger models to boost DeepSeek reasoning, referencing Magpie-Align's dataset.
- Some feared that training without explicit reasoning data might reduce logical capacities, leading to calls for synthetic expansions from big-scale models.
- Qwen2.5-VL Visual Venture: Members anticipate Qwen2.5-VL support by week's end, looking to extend OCR functionality for augmented vision-language tasks.
- They noted possible synergy with OpenWebUI for real-time image-based question answering, fueling optimism for next-level OCR fine-tuning.
- Asynchronous Federated Learning Foray: A member showcased an Asynchronous Federated Learning paper, emphasizing minimal coordination for devices training models in parallel.
- They also shared a slideshow, inspiring discussions about scaling local training across multiple systems.
OpenAI Discord
- DeepSeek Dares OpenAI: Community tested DeepSeek R1 side-by-side with OpenAI's models for coding and creative tasks, revealing more coherent outputs under certain conditions but also limitations with sensitive topics, including politics.
- They also shared this video on 'DeepSeek AI Exposes Tech Oligarchy's Multi-Billion Dollar Scam', highlighting broader censorship questions.
- Multiple Models Mean More Insights: Members suggested querying multiple AI systems in parallel to bypass default content filters or shortfalls in a single model, particularly for controversial queries.
- Some dubbed it a form of ensemble AI, though others noted there's no official framework yet for seamlessly merging these outputs.
- GPT Link Woes & Memory Misfires: Participants uncovered a trick involving an invisible zero width space (like
httpXs://
) to sidestep unwanted link formatting, citing a StackOverflow post.- They also reported Custom GPT failing to output all links reliably and noted contradictions in GPT’s user memory handling, sparking discussions about incomplete references to personal details.
- o3-mini Tackles Owl-Palm Puzzle: A member fixated on whether o3-mini could solve the owl-palm tree riddle, treating it as a serious test of reasoning capabilities.
- They declared “That's the only benchmark I care about!”, emphasizing how singular puzzle performance can steer model comparisons.
LM Studio Discord
- DeepSeek R1 Dares Qwen2.5 in Price-Performance Faceoff: Community members compared DeepSeek R1 and its distilled variants against Qwen2.5 for coding tasks in LM Studio, balancing budget constraints and overall response quality. They also noted that Qwen2.5 can be accessed via Hugging Face or bartowski builds, emphasizing how price and performance interplay.
- One user suggested that “Qwen2.5 is simpler to deploy but trades some fine-tuning options,” while others praised DeepSeek for maintaining higher accuracy despite a steeper VRAM requirement. They shared gguf README notes as a reference for advanced tuning.
- LM Studio Loading Limbo: Multiple folks encountered model loading failures on Mac machines for LM Studio, citing minimal hardware specs as a key trouble spot. Some recommended toggling advanced settings or adopting the beta version, referencing potential fixes in the gguf documentation.
- One user noted that “GPU memory constraints can freeze everything” unless you adjust concurrency settings. Another user suggested frequent updates in the LM Studio beta channel to fix stability issues.
- RAG Riddles in Document Handling: Users debated the reliability of RAG in LM Studio, stressing that choosing a robust model is vital for demanding, domain-focused tasks. They argued that standard configurations often stumble on specialized questions, hinting at 'GPT-level solutions' or more refined retrieval strategies, though no direct references were provided.
- One user noted “RAG can feel puzzling if the model doesn't have enough context,” while others recommended specialized retrieval solutions for domain-heavy data. A few suggested exploring more advanced chunking or embeddings to reduce error rates.
- Memory Bandwidth Takes Center Stage: Participants noted LLM performance hinges significantly on memory bandwidth, comparing Macs unfavorably to GPUs like A4000 or 3060. They added that pairing Threadripper or EPYC CPUs with multiple GPUs handles models such as DeepSeek R1 Distill-Qwen 70B more efficiently, without any direct link given.
- One user joked “You can't outrun memory bandwidth, even with a Threadripper CPU,” referencing this GPU bandwidth table. Meanwhile, others emphasized the synergy of higher VRAM with deep language models.
- CSV Chaos: LLMs vs Cross-Chain Transactions: A user sought an LLM approach for uniform CSV transaction formatting, spotlighting the complexities of cross-chain data. Responders recommended Python scripting for consistency and scale, suggesting that relying solely on LLMs could be error-prone for larger datasets.
- One community member quipped that “For big CSV merges, code is cheaper than LLM tokens,” underscoring the reliability of scripts in data-centric tasks. Another agreed, mentioning Python as the preferred tool for stable output.
aider (Paul Gauthier) Discord
- Qwen 2.5 Max Mix-Up: The community debated Qwen 2.5 Max's open-source nature, concluding it is not fully available for local usage due to hefty GPU demands, citing this tweet.
- Others explored ways to incorporate Qwen 2.5 Max into coding workflows, noting a demo on Hugging Face but lamenting the high memory requirements.
- Model Speed Marathon: Some users reported low throughput from hyperbolic's R1, with response times occasionally exceeding a minute and an output rate of about 12 tokens per second.
- They examined system resource usage and referenced the aider/benchmark README to identify bottlenecks and improve performance metrics.
- Open-R1 Gains GitHub Glare: A project named open-r1 emerged, shared via this GitHub link, suggesting potential open approaches to the R1 model.
- Enthusiasts recommended researching its architecture and possible applications, hinting that it might offer fresh exploration paths for large-model enthusiasts.
Perplexity AI Discord
- Sonar & DeepSeek Earn Applause: The Sonar Reasoning API launched, powering chain-of-thought with real-time citations, and DeepSeek R1 is now integrated into the Perplexity Mac App via a quick command update, hosted in US data centers to safeguard privacy per the official note.
- Community members reported a few formatting rejections from Sonar but praised its real-time search, while some questioned if it uses the R1 (671B) or a distilled model, prompting requests for more transparency.
- DeepSeek's Daily Limit Jumps and Rivalry with O1: Perplexity raised DeepSeek R1 daily query limits to 50 for Pro and 5 for free users, with CEO Aravind Srinivas outlining further expansions as capacity improves.
- A YouTube video suggested DeepSeek R1 might surpass OpenAI's O1, energizing discourse about performance metrics and chain-of-thought impact, reflecting continued discussions on reasoning quality.
- Alibaba Preps a New Model: A user shared a link on Alibaba's possible AI model, hinting at shifts in competition within the tech sector.
- Community members debated its potential to heighten market rivalries and accelerate R&D, highlighting how large-scale models could reshape Alibaba's ecosystem.
- Java 23 to Java 2 Twist: A move from Java 23 SDK to Java 2 triggered debates over public services lagging behind private adoption, referencing real-world adaptation.
- Participants worried about QA bottlenecks in government use and questioned if swifter rollouts might counter institutional inertia.
Nous Research AI Discord
- Memory Matters for MoE: During the Mixture-of-Experts discussion, participants stressed that memory size is crucial for performance on CPU setups, with higher bandwidth boosting token speeds.
- They shared optimization tips and pointed out that HPC-like resource management often outperforms standard configurations when tackling complex loads.
- Funding Flourish at Nous: Community members revealed that Nous Research relies on VC backers, donations, and minimal merch sales to cover computing expenses.
- They humorously noted merchandise income is small, yet still part of a broader multi-source approach to keeping large-scale AI projects afloat.
- DeepSeek R1 Debuts on Azure: The DeepSeek R1 model went live on the Azure AI Foundry and GitHub, giving developers instant accessibility.
- Community members cheered its entrance among over 1,800 AI models, seeing it as a sturdy enterprise solution within Microsoft’s offerings.
- Olama: CLI vs GUI Showdown: While Olama was proposed to run local models like Mistral or Deepseek-distilled, some disliked its CLI reliance, preferring a more visual approach.
- Others suggested KoboldCPP or LM Studio for those wanting friendlier interfaces or different licensing, weighing usability against feature sets.
- AoE: Experts Pick Their Own Tokens: A new paper introduced Autonomy-of-Experts (AoE), where modules use internal activations to decide if they should handle an input, bypassing the usual router.
- In this setup, only the top-ranked experts continue processing, potentially enhancing efficiency and surpassing conventional MoE token assignment.
Codeium (Windsurf) Discord
- DeepSeek Dilemma at Windsurf: Users lament the missing DeepSeek R1 integration in Windsurf, fueling threats to switch to Cursor for better tool-calling features.
- Some observed that DeepSeek struggles with efficient requests, making its synergy with Windsurf difficult.
- Sonnet LLM Slip-ups: Multiple members criticized the Sonnet LLM for inconsistent coding reliability, stating that prompt comprehension has dropped.
- Others demanded faster improvements, noting suboptimal performance that burns credits without boosting productivity.
- Cascade Confusion & Code Declines: Some reported Cascade accidentally wiping context or generating errors when modifying files, forcing manual refactoring.
- A few still see promise in Cascade’s approach, urging caution when editing large codebases to avoid repeated missteps.
- Flex Credits Fog: New sign-ups found Flex credits allocations puzzling, with unclear trial totals and no easy credit refunds for flawed outputs.
- Several pointed to Codeium Status for potential clarifications, while others encouraged direct support outreach.
- Windsurf Performance & Extension Setup: Members noted choppy speed in Windsurf chat and flagged difficulties with the Codeium extension in VSCode not fully parsing selected text.
- They also cited repeated login failures, referencing a ‘Sign in failed’ error tied to a dormant language server and Plans and Pricing Updates that raise cost concerns.
OpenRouter (Alex Atallah) Discord
- Chutes and Ladders for DeepSeek R1: In a recent move, Chutes is offering a free endpoint for DeepSeek R1 via OpenRouter, giving decentralized coverage a boost. This addition provides developers with more ways to sample DeepSeek R1's 671B parameter capacity.
- OpenRouter highlighted that DeepSeek R1 stacks up to OpenAI o1 in performance, with 37B active parameters at inference. One user concluded, “It’s a fine alternative despite the overhead,” emphasizing the model’s open reasoning tokens.
- Perplexity Polishes Sonar: Perplexity upgraded Sonar with speed and cost improvements, as outlined at sonar.perplexity.ai. This refinement aims to optimize large-scale search tasks and keep resource consumption minimal.
- The teased Sonar-Pro promises additional features and is expected to release soon, fueling excitement. Some participants endorsed this route for better synergy with DeepSeek models.
- Sonar-Reasoning Rocks: Sonar-Reasoning, built on DeepSeek's engine, is specialized for advanced search and logic-based tasks, as shown in this announcement. The model is intended to streamline handling complex inquiries.
- OpenRouter provided recommendations for combining web search with Sonar-Reasoning, acknowledging user demand for integrated setups. One user stated, “Having search plus advanced logic is what we needed for big data work.”
- Surge of Feedback on Pricing & Performance: Multiple members raised concerns over DeepSeek v3's translations for languages like Polish, citing incomplete context. They also criticized OpenRouter's 5% API fees, calling them high.
- Some faced empty token outputs and interface glitches, pressing for better error handling. Others emphasized the need for improved retrieval features and adjustable usage limits.
- Clamor for Image Generation: Some members requested direct integration of DALL-E or Stability AI into OpenRouter, hoping to expand the platform’s capabilities. They believe visual generation could attract more participants and broaden use cases.
- Others noted the ties with translation functionality, suggesting potential multi-modal enhancements. Though no confirmations surfaced, the keen interest hinted at bigger possibilities ahead.
Interconnects (Nathan Lambert) Discord
- DeepSeek Data Drama & Database Debacle: Wiz Research found DeepLeak, a publicly exposed ClickHouse instance revealing secret keys, internal chats, and open paths for data exfiltration (see Tweet).
- A separate critical vulnerability report further outlined possible API key leaks, prompting calls for immediate fixes.
- R1 vs R1-Zero Rivalry: Community analysis suggests R1-Zero surpasses R1 in importance, highlighting an in-depth post on both models’ hosting challenges.
- Enthusiasts expressed mild disappointment over R1 being the public-facing flagship, calling it “nerfed for human consumption.”
- Llama 4 Overhaul & Delays: Rumors indicate Llama 4 is being rebuilt from scratch, with this claim insinuating a major pivot in strategy.
- Partners like Together received scant details, implying a shift away from the previously forecasted February launch.
- Grok 3 & O3-mini Release Buzz: Hints suggest Grok 3 and O3-mini might hit in January, though internal chatter points to possible rescheduling for typical Thursday drops.
- A Tibor Blaho update noted a ‘thinking’ model approach, stirring hopes for next-level reasoning features.
- DeepSeek v3 with MoE & MTP: The DeepSeek v3 paper surprised readers by skipping auxiliary losses for Mixture-of-Experts, fueling curiosity about the training setup (see MoE LLMs).
- Folks speculated on Multi-Token Prediction boosting token acceptance rates, yet many inference frameworks lack native support for that method.
Cursor IDE Discord
- DeepSeek Dilemma: Token Terrors: Repeatedly, DeepSeek fails to generate code due to token constraints, leaving users annoyed with incomplete outputs; one user lamented “It keeps yapping then it cannot generate a code due to token limit.”
- Another pointed to a tweet from Ihtesham Haider about 'Qwen' overshadowing DeepSeek, claiming Qwen beats ChatGPT-o1 and Claude Sonnet in multiple tasks.
- Cursor IDE Catastrophe: Post-Update Pandemonium: Multiple users reported new Cursor IDE bugs after the recent update, including broken tab completion, stray imports, and improper markdown outputs, with one user noting “Cursor no longer displays its markdown output correctly.”
- Community members recommended reporting problems on the Cursor Forum or checking the Cursor Status page for any known disruption.
- Claude 3.5 Limit Lockdown: Many griped about the free-tier constraints in Claude 3.5, which blocks usage after 50 slow premium requests and offers no cooldown workaround.
- One user questioned a possible respite, but others confirmed that once the limit is reached, Claude 3.5 denies further requests.
- Crowdsourced Upgrades for Cursor: Calls emerged for more AI models in Cursor, especially in agent mode, to boost developer options and reduce token-related pitfalls.
- A user suggested ideas in a tweet asking what improvements people most want in Cursor.
- Sonnet 3.5 Subscription Snafu: One user reported that Sonnet 3.5 won't function with their Cursor subscription but works with a personal API key.
- The community directed them to the Cursor Forum thread on Sonnet 3.5 issues for bug reporting and potential fixes.
Yannick Kilcher Discord
- Softmax Shake-Up & RL Woes: A new Softmax variation was proposed to counter noisy accuracy and suboptimal learning in certain scenarios, stirring interest among researchers seeking better training gradients.
- Several members emphasized Deep RL concerns, noting that default Softmax can lead to mode collapse and urging more flexible methods.
- DeepSeek Data Drama: DeepSeek trained a 671B-parameter Mixture-of-Experts model using 2,048 Nvidia H800 GPUs and PTX in two months, reporting a 10X efficiency jump over standard practices.
- Meanwhile, Bloomberg and the Financial Times covered accusations that DeepSeek used OpenAI data unfairly, with some calling it a smear job amid Italy's ongoing scrutiny.
- Qwen2 VL & PydanticAI Shout-Out: Qwen2 VL impressed users by generating tokens at high speed with 8K quant on a 7B M1 Chip, inspiring remarks that they "pour out like crazy."
- A PydanticAI code snippet also generated buzz, showing how easily data validation can integrate with a GroqModel-based agent.
- O3-mini’s Big Leap: Debate swirled around the upcoming O3-mini, promising to run at 4x the speed of O1-mini and potentially outperform R1.
- Some cited this tweet as evidence that OpenAI might gain a serious advantage in the US market with such faster models.
- Claude 3.5’s Price Tag: Claude 3.5 reportedly cost tens of millions to train, highlighting the scale of financial investment in next-generation language models.
- Community members viewed this sum as proof that ambitious AI development demands hefty funding and broad computational resources.
Eleuther Discord
- Mordechai’s Momentum: Neuroscience Book & Kickstarter: Mordechai Rorvig showcased his neuroscience book project, focusing on the interplay of large-scale brain functions, emotional AI processing, and potential expansions from a fundraiser on Kickstarter. He requested feedback on the synergy between deep learning architectures and biological cognition, hoping to refine proposed design features for advanced AI systems.
- Discussion touched on how these ideas might inform improved models of emotional intelligence, with several participants applauding the combined lens of neuroscience and modern AI research.
- Min-P Magic: A New Twist on Text Generation: The newly introduced min-p sampling technique adjusts the threshold based on model confidence, aims for enhanced text quality and diversity, and references Turning Up the Heat: Min-p Sampling for Creative and Coherent LLM.... It prompted questions about whether token restriction hampers exploration, especially when compared to top-p approaches.
- Some participants worried about over-constraining model outputs, while others viewed min-p as a valuable method to manage perplexity across different tasks.
- SFT vs. RL: The Great Generalization Debate: Members dissected 'SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training' (link), discussing how SFT’s rapid pattern usage and RL’s wider solution search might be combined for stronger generalization. They noted that SFT can apply training data reliably, while RL seems to foster more open-ended behaviors.
- Some suggested RL enables emergent problem-solving, but others highlighted SFT’s consistency for certain tasks, pointing to a balance of both methods as a next-step strategy.
- Sparse Autoencoders: A Seed-Driven Saga: A new paper titled Sparse Autoencoders Trained on the Same Data Learn Different Features reported that SAEs share only 30% of their learned features across various seeds, raising concerns about feature stability. Authors questioned whether these representations remain reliable for interpretability tasks without additional constraints.
- The group proposed parallel training on multiple seeds to align outputs, while some countered that alternative regularization or architecture choices might offer more consistent outcomes.
- Fastfood in Focus: Speedy Kernel Expansion: Engineers revisited Fastfood from Fastfood: Approximate Kernel Expansions in Loglinear Time, leveraging Hadamard operations for faster kernel expansions and smaller memory footprints. Initial tests showed reduced overhead in large-scale computations and kindled interest among advanced LLM developers.
- A few participants explored integrating Fastfood into extensive networks, hoping to curb storage demands while preserving accuracy, though some cautioned about the need for more real-world tests.
GPU MODE Discord
- GPU Direct Storage Gains & Weight Compression Whispers: In #general, members explored GPU Direct Storage for efficient PCIe peer-to-peer data transfers, reporting partial success compressing weights from 4.7GB to 3.7GB.
- They also considered parallel-friendly compression and memory snapshotting, citing NVIDIA/gdrcopy and gpudirect/libgdsync to reduce overhead and load safetensors directly into VRAM.
- Breezy Blackwell & Cozy CUDA Type Puns: In #cuda, the RTX Blackwell architecture was rumored to boost FP16/32 throughput by 27% compared to the 4090, while 5th gen Tensor Cores show minimal changes for consumer cards as seen on NVIDIA's official page.
- They also emphasized using memcpy() for type punning and strict memory alignment in CUDA to avoid undefined behavior and possibly gain register-level optimizations.
- Lean Llama: Minimal Training Code Emerges: In #cool-links and #self-promotion, members shared a minimal codebase for Llama training at speed_llama3, aiming for efficiency.
- They showcased FP4 approaches for large language models and discussed block-size quantization strategies to refine performance.
- Thunderkitten & The DSM Potential: A dev proposed new hardware feature support for Distributed Shared Memory (DSM) in Thunderkitten, suggesting persistent kernels for better data reuse.
- They also highlighted threadblock-to-SM scheduling for performance gains, leaning on background from a 2.5-year stint at NV.
- Arc-AGI-2: Chess Puzzles & Dynamic Reasoning: Members in #arc-agi-2 discussed dynamic evaluation for reasoning tasks, with simplified chess puzzles (e.g., mate-in-two) in development.
- They also pitched generating 'Wikipedia game' solutions and training explainer models for deeper insight, referencing inference engines like vLLM for streamlined batch processing.
Stability.ai (Stable Diffusion) Discord
- ComfyUI Clash vs Forge: Users argued whether ComfyUI is unnecessarily complicated, referencing Forge’s GitHub repo for a more direct approach.
- Some appreciate ComfyUI’s advanced pipeline features, while others want a minimal interface for quick setup.
- Image Generation Tools and Workflows: Participants discussed workflows for tasks like realistic character generation, highlighting attempts with the autismmix model for fantasy themes.
- They pointed to Kolors Virtual Try-On as an example, noting many want simpler menus for stable results.
- Python Problems for Stable Diffusion: A user hit Python errors while installing Stable Diffusion, prompting debug advice on dependencies.
- They also shared a curious link, which drew attention to potential environment misconfigurations.
Stackblitz (Bolt.new) Discord
- Bolt's Export/Import Makeover: Starting now, Bolt guarantees that all imports and exports are functioning correctly, including previously missing default exports, as noted in this tweet.
- The update particularly ensures 'export default' support, delivering a smoother coding environment and immediate improvements across projects.
- Backend Picks & Firebase Challenges: Developers requested guidance on recommended backend solutions for their projects, hoping for robust setups to fit their needs.
- Another member described a steep Firebase learning curve but noted growing comfort through repeated hands-on exploration.
- Token Tussles & Service Snags in Bolt: Users raised concerns about rapid token consumption during frequent debugging, emphasizing the impact of lengthy prompts and complex projects.
- Some also reported server errors and availability glitches in Bolt, voicing frustration about platform stability.
- GitHub OAuth & Domain Dilemmas: To switch GitHub accounts linked with Stackblitz, users must revoke permissions in GitHub and delete their old Stackblitz account, with no alternative workaround.
- Meanwhile, a question about custom domain usage in Supabase and Netlify revealed root CNAME record conflicts, though Supabase can work without a custom domain despite email clarity benefits.
MCP (Glama) Discord
- Goose Gains Ground: Community members praised the Goose client for its CLI orientation and synergy with MCP servers, covering usage and better integrated flows.
- They also flagged token usage constraints, referencing michaelneale/deepseek-r1-goose for ways to address rate limits.
- Sheets Integration Sizzles: A developer demonstrated an MCP server reading from Google Drive and editing Google Sheets, showcased in mcp-gdrive.
- They noted limited chart formatting but saw potential for broader features with more exploration.
- DeepSeek Distill Goes Big: DeepSeek-R1-Distill-Qwen-32B outdid OpenAI-o1-mini in multiple benchmarks, as reported in DeepSeek model info.
- Members reported smoother results with Kluster.ai for integrating these models into MCP, highlighting alternative approaches.
- mcp-agent Hits #1 on Show HN: The mcp-agent framework snagged #1 on Show HN, spotlighting workforce-friendly patterns for building agents with Model Context Protocol.
- The repository at lastmile-ai/mcp-agent gathered feedback for future improvements.
- lüm AI Supports Mental Health: The lüm companion for mental health, found at lüm - Your AI Companion, introduced a privacy-first practice approach.
- Its developer calls on the community to share ideas for upcoming psychological utilities, aligning with mental health applications.
Nomic.ai (GPT4All) Discord
- Distilled DeepSeek R1 Gains Ground: Community members reported on bartowski's DeepSeek-R1-Distill-Llama-8B-GGUF, highlighting 8b distill models as surprisingly strong compared to heavier 70b quant setups.
- They noted that while R1 distills seem competent, many still want to see bigger model options, referencing a video explaining DeepSeek R1 concepts.
- CUDA and CPU Collaboration Creates Speed: Participants discussed running DeepSeek models on CUDA, often hitting 5t/s on CPU with q8_0 for local tasks.
- They described ongoing improvements for higher throughput, referencing an open PR on GPT4All to bolster local inference.
- LM Studio Doubts and Template Tweaks: Contributors expressed hesitation about LM Studio due to closed-source aspects and uncertain compatibility with DeepSeek.
- They proposed refining template strategies and advanced instructions to sharpen prompt output for R1 distill models.
- Optimism for New R1 Releases: Multiple members look forward to 32b R1 distills, hoping these forthcoming versions address performance gaps under local conditions.
- They cited unsloth's 8B Distill LLaMA repository as an example of consistent improvements and near-future potential.
Notebook LM Discord Discord
- NotebookLM's File-Size Friction: Users worried about loading hefty ecology-based engineering textbooks and multiple documents, citing a needle in the haystack scenario for queries. They referenced NotebookLM Help about maximum file size limits and recommended smaller chunks for clarity.
- Additional concerns arose over storing academic material on NotebookLM alone, prompting suggestions to keep duplicates in Google Drive since NotebookLM does not offer direct downloads of uploaded sources.
- Note Conversion Sparks Efficiency: One user highlighted a technique of converting notes into sources, enabling easier comparisons of unstructured survey data. They shared that summarizing and reformatting references improved clarity when cross-referencing multiple datasets.
- However, some folks questioned if this approach might be redundant, pointing out that notes inherently mirror existing source content.
- Add New Button Vanishes: Members experienced confusion when the 'Add New' button disappeared, suspecting a possible cap on NotebookLM usage. They advised consulting built-in self-query features to uncover any hidden account or feature restrictions.
- A link to NotebookLM Plus Upgrade surfaced, though the exact cause of the button's absence remained uncertain.
- LinkedIn Lockdown Meets PDFs: A user ran into problems adding a LinkedIn profile as a source, possibly due to crawling restrictions. The proposed workaround was exporting the page to a PDF, then uploading it into NotebookLM.
- This strategy ensured better reliability when dealing with websites that limit direct data capture.
- Podcast Plans and API Dreams: Folks experimented with longer-duration podcast generation in NotebookLM, aiming for 30-minute scripts or more. They swapped ideas on ensuring stable audio output and possible integrations.
- Queries also arose about an API for connecting NotebookLM with Salesforce, but there was no estimated release date provided for that feature.
Latent Space Discord
- DeepSeek's R1-Zero Gains Momentum: After checking R1-Zero and R1 Results, R1-Zero achieves comparable performance in math and coding, indicating that extensive SFT may not be required.
- Community members initially voiced concerns about incoherence, but testing reported no major flaws in R1-Zero's logical outputs.
- Huawei 910C Fuels DeepSeek: DeepSeek has switched to Huawei's 910C chips for inference, as noted in this post, sparking debate on potential trade-offs compared with Nvidia hardware.
- Participants discussed memory constraints on Huawei chips, with some uncertain if they can handle large-scale training without performance hits.
- OpenAI's ChatGPT Pro Overtakes Enterprise: According to this tweet, OpenAI's $200/month ChatGPT Pro outperforms ChatGPT Enterprise in revenue, reflecting strong subscription growth.
- However, commentators suggest that enterprise deals might be losing money, raising questions about the long-term model.
- Sourcegraph Debuts Enterprise Agent: Sourcegraph introduced a new enterprise agent coding solution to rival Windsurf, set to be discussed at AIENYC with a dedicated booking case study.
- Community chatter highlights the product’s aim to make AI-assisted coding more accessible and relevant for large-scale deployments.
- Microsoft's Copilot Rollout Under Fire: Observers criticized the Microsoft 365 Copilot launch for poor execution, stirring confusion among new users.
- Commentary pointed to marketing stumbles and an unclear strategy, suggesting an identity crisis within Microsoft’s AI services.
Cohere Discord
- Command-r-plus Confusion & Repetitions: Some users reported shorter replies from command-r-plus but got thorough (yet repetitive) responses when switching to command-r-plus-08-2024 for problem-solving tasks.
- Support clarified command-r-plus still points to -04-2024 since September and advised sharing code snippets while recommending upgrades like command-r7b-12-2024 for more robust output.
- Safety Modes from Contextual to Strict: The new Safety Modes—CONTEXTUAL, STRICT, and NONE—come with Cohere documentation for refined output restrictions on newer models.
- Users praised CONTEXTUAL for creative or educational tasks and STRICT for strong guardrails, while toggling to NONE fully disables safeguards for unrestricted content.
- Rerveting Efforts Prompt & Aya 8b Gains: Developers tested the Rerveting Efforts Reasoning Prompt on Aya 8b, fighting setup hurdles but spotting promising logic.
- They requested feedback on its “hidden potential” and plan to refine it further alongside ongoing image analysis experiments.
- Markdown Snags & Clipboard Saves: A user nearly lost a critical prompt but rescued it with Windows + V, highlighting the importance of advanced clipboard features.
- Meanwhile, formatting woes in Markdown sparked frustration, prompting tips and tricks to simplify markdown usage in project workflows.
LLM Agents (Berkeley MOOC) Discord
- Certificate Surprises & No Hackathon: MOOC discussion confirmed non-student certificates, announced no hackathon this semester, and clarified 3-4 students per project team for the application track.
- Attendees learned the public course aligns with Berkeley's original curriculum and were advised to watch for final details in upcoming announcements.
- Lecture Links & Resources for LLM Agents: Members shared new lecture transcripts and official slides for CS 194/294-280 to ease advanced studying.
- They proposed extending these resources to all lectures, underscoring the group's enthusiasm for open collaboration.
- Stake Airdrop Stirs Excitement: A Stake Airdrop campaign started, encouraging participants to claim rewards early at stakeair-drop.com before the event ends.
- Enthusiasts emphasized its limited-time benefits, urging early stakers to maximize returns.
Modular (Mojo 🔥) Discord
- Mojo's LSP Enigma: A user uncovered hidden LLVM flags while running
magic run mojo-lsp-server --help
, with no accessible documentation in sight.- Another user suggested opening a GitHub issue so the Mojo tooling team can address or conceal these internal parameters.
- TIOBE Talks Up Mojo: Mojo earned a mention in TIOBE, where the CEO forecast a near top 20 ranking by 2025.
- Community members expressed excitement, interpreting it as a sign of accelerating developer interest.
- VS Code Folding Q&A: Someone asked if the VS Code extension for Mojo supports code folding or planned to add it soon.
- A user advised moving the query to a relevant channel, noting it might need feedback from the extension maintainers.
- Mojo Roadmap Rumbles: Community members requested a refreshed roadmap for Mojo as 2025 looms on the horizon.
- They highlighted the need for clarity and detailed next steps for the language's onward development.
Torchtune Discord
- Office Hours & Banana Bread Bonanza: Torchtune is hosting open office hours next Thursday at 13:30 US ET to discuss upcoming features and address library issues, with an event link here.
- Attendees can enjoy famous banana bread during the talk, which promises to keep spirits high.
- Metrics Muddle: DPO Device Aggregation: Community members questioned how DPO metrics are combined across devices and proposed using
dist.all_reduce
for better consistency, referencing issue #2307.- They plan to open a PR soon to unify metrics across multiple machines, aiming to improve DPO validation.
- Loss Normalization: The Missing Ingredient: People noted no loss normalization is included in the DPO implementation, pointing out a difference between
lora_dpo_distributed
andfull_finetune_distributed
recipes.- They plan to explore a quick fix, with members offering to coordinate debugging efforts.
- Imagen vs. Chatbot? A Confused Inquiry: A question surfaced about Imagen or Image2Txt, but it ended up focusing on the chatbot feature instead.
- The inquirer retracted the original query, eventually concluding the conversation remained chatbot-centric.
Axolotl AI Discord
- Multi-Turn KTO Mystery: One member inquired about the status of multi-turn KTO, but no update was provided.
- Their question triggered speculation about the next steps for KTO, but the conversation didn't produce any firm plan.
- RLHF Recruit Reassigned: Nanobitz confirmed a new recruit joined for RLHF, but they were directed to a different PR instead.
- This shift disappointed a member who wanted more immediate RLHF involvement in the project.
- NeurIPS Manuscript in the Works: A member announced a plan to submit a NeurIPS manuscript this year, indicating a serious push for published results.
- They reported that this effort might benefit from upcoming research synergy with the KTO project.
- March Deadline Looms: The same member emphasized that a related model is due in March, raising concerns about meeting that milestone.
- They worried that any holdups could derail planned experiments and hamper their timeline.
- Axolotl Anxiety: A member warned that Axolotl usage challenges might jeopardize the project’s KTO aspirations.
- They suggested addressing Axolotl issues promptly to avoid disruptions and keep the workflow on track.
LlamaIndex Discord
- ScrapeGraph & LlamaIndex Join Forces for Quick Web Curation: Integrating ScrapeGraph AI with LlamaIndex enables fast extraction of unstructured data from websites, powering slick web scraping processes.
- This approach was highlighted on Twitter, illustrating how AI agents can handle repeated data gathering chores with minimal overhead.
- LlamaIndex Bolsters Financial Reports with Visual Flair: A new guide shows how to produce multimodal financial statements by mixing text and visuals from PDFs through LlamaIndex.
- This tactic helps teams handle both textual breakdowns and image-based elements in a single flow, boosting insights for finance tasks.
- LlamaCloud Changes Spark Waitlist Questions: A missing Index button in the GUI raised questions about the invite-only LlamaCloud program, which members can join through a waitlist of unclear length.
- Others noted Confluence was grayed out, implying that certain data sources may require Premium membership despite the exact conditions being unclear.
MLOps @Chipro Discord
- Databricks & Featureform Fuel MLOps: The MLOps Workshop on January 30th at 8 A.M. PT features Simba Khadder explaining how to build a feature store on Databricks.
- Attendees will learn about Featureform integration and tips for Unity Catalog, with a Q&A at the end.
- Skepticism Surrounds AI’s Push into Dev Roles: A participant pushed back on Zuck’s claim that AI could replace mid-level devs, stating the profession is far from dead.
- Others pointed out continuous gains in AI wrappers, intensifying the discussion on whether AI truly threatens dev positions.
DSPy Discord
- Auto-Diff Ditches Manual Prompting: The paper titled Auto-Differentiating Any LLM Workflow highlights how auto-differentiation in local language model workflows can remove manual prompting, enabling faster iterative processes.
- Authors remark that automation drives more efficient generation cycles by removing repeated instructions in LLM interactions.
- Shift to Automated LLM Interactions: The paper asserts that auto-differentiation significantly improves user experience by automating complex steps in LLM usage.
- Community members anticipate a major reduction in cognitive load, calling it a step toward smooth LLM integration in day-to-day tasks.
OpenInterpreter Discord
- Goose Gains Ground with Transparency: The Goose agent, found here, runs locally while offering connections to MCP servers or APIs, giving direct control to developers.
- Users praised its autonomous handling of debugging and deployment tasks, alleviating overhead for engineering teams.
- Engineers Celebrate Goose's Autonomy: One developer said using Goose felt like being Maverick from Top Gun, enjoying a fun and efficient workflow.
- They shared a success story generating fake data for API testing by simply instructing Goose to update objects and run tests.
tinygrad (George Hotz) Discord
- Tinygrad Gains an Interactive Branching Twist: A member proposed building a tool akin to Learn Git Branching to teach Tinygrad fundamentals with branching-step puzzles.
- They also referenced the puzzles from tinygrad-tensor-puzzles, underlining how short challenges could keep learners engaged.
- Focus on Structured Tinygrad Code Architecture: Participants stressed that Tinygrad benefits from a well-organized code layout, suggesting puzzle-based modules to reduce confusion.
- They noted that a systematic overview of Tinygrad's internals could strengthen skill-building and spark more curiosity among developers.
LAION Discord
- Casual greeting from spirit_from_germany: They simply asked 'How is it going?' but did not discuss any AI or technical details.
- No new conversation points or references to AI projects were introduced here.
- No Additional AI Discussions: No further responses or expansions on LLM or AI developments followed this greeting.
- Hence, there are no references to new tools, benchmarks, or model releases to summarize.
The Mozilla AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The HuggingFace Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The Gorilla LLM (Berkeley Function Calling) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
PART 2: Detailed by-Channel summaries and links
Unsloth AI (Daniel Han) ▷ #general (584 messages🔥🔥🔥):
Unsloth AI performance and functionalities, Training with deep learning models, Reinforcement Learning advancements, Fine-tuning models with synthetic datasets, Dynamic quantization for efficient modeling
- Unsloth AI integrates R1 1.58-bit into OpenWebUI: Unsloth AI has successfully implemented the 1.58-bit version of DeepSeek-R1 into OpenWebUI, reducing the model size from 720GB to just 131GB.
- This model can achieve fast inference rates of approximately 140 tokens/sec using 160GB VRAM, thanks to selective layer quantization.
- Challenges in generating reasoning with fine-tuning: Users expressed concerns about the feasibility of fine-tuning DeepSeek without a dataset containing Chain of Thought examples, as it could detract from reasoning capabilities.
- The suggestion was to create synthetic datasets containing reasoning to aid in model fine-tuning, leveraging larger models for generating reasoning outputs.
- Issues with model loading and configuration: A user faced difficulties when Unsloth automatically loaded a fine-tuned model instead of the base model, leading to confusion during tests.
- This was attributed to the naming conventions in model configuration, highlighting the need for clear communication about model loading sources.
- Exploration of training methods with Reinforcement Learning: The community discussed the integration of GRPO training alongside existing methods, with some experimenting with DPO for optimization.
- A reward model is required for effective training, focusing on understanding how to craft policies for improved model behavior.
- Utilizing existing datasets for advanced training: There's an interest in utilizing the Wikimedia dataset for training a Mistral model, although concerns about data format were raised.
- The conversation highlighted the importance of clear structuring and dataset preparation for effective training outcomes.
- Tweet from Open WebUI (@OpenWebUI): 🚀 You can now run 1.58-bit DeepSeek-R1 (non-distilled version) on Open WebUI with llama.cpp, thanks to @UnslothAI! 💻⚡️ (Tested on M4 Max, 128GB RAM) 📝 Dive into the details in their blog post: htt...
- SIGJNF/deepseek-r1-671b-1.58bit: Unsloth's DeepSeek-R1 1.58-bit, I just merged the thing and uploaded it here. This is the full 671b model, albeit dynamically quantized to 1.58bits.
- Cat Wizard GIF - Cat Wizard Meme - Discover & Share GIFs: Click to view the GIF
- Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Deepseek-R1-Llama-70B · Datasets at Hugging Face: no description found
- estrogen/DeepSeekMoE-3B · Hugging Face: no description found
- Kukedlc/Qwen2-1.5B-Spanish-1.0 · Hugging Face: no description found
- DevQuasar/DevQuasar-R1-Uncensored-Llama-8B · Hugging Face: no description found
- estrogen/DeepSeekMoE-3B at main: no description found
- GitHub - EvolvingLMMs-Lab/open-r1-multimodal: A fork to add multimodal model training to open-r1: A fork to add multimodal model training to open-r1 - EvolvingLMMs-Lab/open-r1-multimodal
- Beginner? Start here! | Unsloth Documentation: no description found
- Magpie-Align/Magpie-Reasoning-V1-150K-CoT-Deepseek-R1-Llama-70B · Datasets at Hugging Face: no description found
- Reddit - Dive into anything: no description found
- Reddit - Dive into anything: no description found
- Reddit - Dive into anything: no description found
- Tutorial: How to Finetune Llama-3 and Use In Ollama | Unsloth Documentation: Beginner's Guide for creating a customized personal assistant (like ChatGPT) to run locally on Ollama
- GitHub - huggingface/smol-course: A course on aligning smol models.: A course on aligning smol models. Contribute to huggingface/smol-course development by creating an account on GitHub.
- Models & Pricing | DeepSeek API Docs: The prices listed below are in unites of per 1M tokens. A token, the smallest unit of text that the model recognizes, can be a word, a number, or even a punctuation mark. We will bill based on the tot...
- Reward Modelling - DPO, ORPO & KTO | Unsloth Documentation: To use DPO, ORPO or KTO with Unsloth, follow the steps below:
- DeepSeek-MoE/finetune/finetune.py at main · deepseek-ai/DeepSeek-MoE: DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models - deepseek-ai/DeepSeek-MoE
- Unsloth Requirements | Unsloth Documentation: Here are Unsloth's requirements including system and GPU VRAM requirements.
Unsloth AI (Daniel Han) ▷ #off-topic (24 messages🔥):
Federated Learning, LaMDA sentience claims, Consciousness and Sentience, AI Roleplaying, Deepseek use in workplace
- Exploring Asynchronous Federated Learning: A member presented a paper on Federated Learning discussing how devices can train models asynchronously, citing applications like mobile keyboard auto-completion.
- They shared a slideshow to highlight important insights from the presentation.
- Discussion on LaMDA's Sentience: Concerns arose over claims made by a Google engineer regarding LaMDA's sentience, with quotes from LaMDA suggesting it felt happy or sad at times, sparking debate.
- Members joked about the engineer's credibility, with one suggesting that LaMDA's capabilities mimic role-playing rather than genuine self-awareness.
- Debating the Nature of Consciousness: Members discussed the complexity of consciousness and if any LLM has self-recurring connections, with one stating it might be as sentient as some humans (which is 'not very').
- The discussion included humorous remarks on the difficulties in defining consciousness and suggested it arises from very complex systems.
- CISO Email on Deepseek: A member humorously inquired if others received emails from their CISO advising against using Deepseek for work purposes, raising eyebrows on its safety.
- Another chided the discussion as off-topic, suggesting using other specialized servers like Stable Diffusion for relevant activities.
- PhD-Level Exploration of Imaginary Numbers: A member sought help to understand imaginary numbers at a PhD level, recalling only the basics learned in school about 'i', the square root of -1.
- This led to a comment joking that distilled models are acting like the average college dropout, underlining the struggle with complex concepts.
- Google Engineer Claims AI Chatbot Is Sentient: Why That Matters: Is it possible for an artificial intelligence to be sentient?
- PAPAYA: PRACTICAL, PRIVATE, AND SCALABLE FEDERATED LEARNING: PAPAYA: PRACTICAL, PRIVATE, AND SCALABLE FEDERATED LEARNING
Unsloth AI (Daniel Han) ▷ #help (131 messages🔥🔥):
DeepSeek R1 model handling, Model training issues and optimizations, Qwen2.5-VL support updates, Ollama and llama.cpp functionalities, Running models on various hardware
- Issues with DeepSeek R1 memory requirements: Users reported that running the DeepSeek R1 model required 132 GiB of RAM despite merging weights, prompting some to consider using llama.cpp for better performance.
- One user confirmed successfully merging weights but still encountered performance constraints.
- Training model Cuda memory errors: Multiple users experienced
Cuda is out of memory
errors while trying to train models on hardware like the 4070 laptop GPU, raising concerns about batch sizes.- Discussions centered on how smaller models might be achievable on specific hardware configurations.
- Upcoming support for Qwen2.5-VL: Community members eagerly anticipate the release of Qwen2.5-VL support, with expectations of availability by the end of the week.
- This upcoming support has generated excitement, particularly for users interested in OCR fine-tuning projects.
- Ollama's capabilities and disk offloading: There were discussions regarding Ollama's ability to offload model data to disk, with some users uncertain about this feature.
- Clarifications were provided on offloading capabilities in different operating systems, namely Linux and Mac.
- Parameter manipulation and model efficiency: Users discussed methods to manipulate and potentially reduce model sizes by focusing on specific language data and efficient training techniques.
- There were queries about the feasibility of retraining models on limited datasets to enhance performance and efficiency.
- SIGJNF/deepseek-r1-671b-1.58bit: Unsloth's DeepSeek-R1 1.58-bit, I just merged the thing and uploaded it here. This is the full 671b model, albeit dynamically quantized to 1.58bits.
- ollama/docs/modelfile.md at main · ollama/ollama: Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 2, and other large language models. - ollama/ollama
- Run DeepSeek-R1 Dynamic 1.58-bit: DeepSeek R-1 is the most powerful open-source reasoning model that performs on par with OpenAI's o1 model.Run the 1.58-bit Dynamic GGUF version by Unsloth.
- unsloth/DeepSeek-R1-GGUF · Hugging Face: no description found
- unsloth/DeepSeek-R1-Distill-Llama-8B-unsloth-bnb-4bit · Hugging Face: no description found
- Unsloth - a Hugging Face Space by Borcherding: no description found
- GitHub - Leoleojames1/unsloth: Finetune Llama 3.1, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory: Finetune Llama 3.1, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory - Leoleojames1/unsloth
- Borcherding/OARC_Commander_v001 · Datasets at Hugging Face: no description found
- mlabonne/FineTome-100k · Datasets at Hugging Face: no description found
- yahma/alpaca-cleaned · Datasets at Hugging Face: no description found
Unsloth AI (Daniel Han) ▷ #research (3 messages):
AGI breakthroughs, Cybergod paper, Auto-download links controversy
- AGI Breakthrough Discussed: A member shared insights on the latest breakthrough in AGI, emphasizing that it relates to money and evolution. They provided a link to their paper titled Cybergod.
- “It is all about money and evolution,” they stated, summing up their findings in a concise manner.
- Auto-download Links Sparks Debate: A member expressed disdain towards auto-download links, labeling them as evil.
- This sparked a humorous reaction from another member, stating “lol” in response.
OpenAI ▷ #ai-discussions (404 messages🔥🔥🔥):
DeepSeek vs OpenAI, Censorship in AI, Using Multiple AI Models, AI in Creative Writing, Real-Time Functionality with AI
- DeepSeek's Comparison with OpenAI Models: DeepSeek R1 is being compared with OpenAI's models, with users expressing varying opinions on performance in tasks like coding and creative writing.
- Some users reported it producing more coherent outputs under certain conditions, while others noted limitations in handling sensitive topics.
- Concerns Over Censorship: Discussions revealed concerns about censorship in AI, particularly regarding political content related to China and American politics.
- Participants noted that many AIs have built-in filters that affect both the quality and breadth of the responses.
- Querying Multiple Models: Users suggested querying different models and merging results to get more comprehensive answers from AI.
- This approach was mentioned as a way to circumvent limitations associated with a single model's filtering and performance issues.
- Limitations in Creative Writing: Users highlighted the difficulties with using GPT models for creative writing due to sensitive filters and context limitations.
- This sensitivity can lead to unavailability of content on topics that may involve violence or historical events, limiting creative expression.
- Real-Time Functionality and Use Cases: A user expressed interest in connecting real-time functionality to their AI assistant on the OpenAI platform seeking guidance.
- The discussion included recommendations for tools like LM Studio, and the potential for enhancing AI's usability in various projects.
- Redirect Notice: no description found
- DeepSeek AI Exposes Tech Oligarchy's Multi-Billion Dollar Scam: Ground News: Get 50% off their unlimited access Vantage plan at https://www.ground.news/majority Watch the Majority Report live Monday–Friday at 12 p.m. EST...
- GitHub - Tencent/Hunyuan3D-2: High-Resolution 3D Assets Generation with Large Scale Hunyuan3D Diffusion Models.: High-Resolution 3D Assets Generation with Large Scale Hunyuan3D Diffusion Models. - Tencent/Hunyuan3D-2
OpenAI ▷ #gpt-4-discussions (30 messages🔥):
Invisible zero width space characters, Custom GPT link output issues, GPT memory and context limitations, Contradictions in GPT responses, User Memory feature challenges
- Using invisible characters to avoid link formatting: A member shared their approach of utilizing an invisible 'zero width' space character like
httpXs://
to prevent unwanted link formatting, referencing a StackOverflow writeup they had done on the topic.- Another member praised this idea, indicating the possible effectiveness of this method.
- Intermittent link output from Custom GPT: Discussions revealed frustrations over the Custom GPT not always outputting all links consistently, regardless of the script's functions.
- It was noted that reliance on GPT for such tasks is unreliable, with members suggesting more concrete instructions to mitigate this issue.
- Memory and context issues in GPT: Members explored the reliability of the memory feature, expressing concerns that GPT often does not utilize user memory as expected, even when prompted.
- The conversation highlighted how context length can lead to inconsistencies in the model's responses, particularly with favorites or personal details.
- Contradictions in GPT's responses: Discussion focused on contradictions in GPT responses, with comparisons made to prior versions and observations about context handling.
- Members noted that even when provided with explicit, relevant questions, GPT may still struggle to maintain consistency, likening the challenge to finding a second 'needle' in a haystack.
- Challenges with user memory recognition: Concerns were raised about GPT incorrectly handling memory prompts, sometimes responding with confusion about the existence of user memory.
- This led to comments on how the recognition of user details might be happening in a separate processing stage, impacting response accuracy.
OpenAI ▷ #prompt-engineering (1 messages):
o3-mini, owl-palm tree riddle
- Curiosity about o3-mini's capabilities: A member questioned whether o3-mini will be able to solve the owl-palm tree riddle, expressing it as their main benchmark of interest.
- This indicates a focus on the performance of the model in solving specific riddles rather than general capabilities.
- Interest in Riddle Benchmarks: The same member's emphasis on the owl-palm tree riddle suggests a personal benchmark for evaluating AI capabilities.
- This focus highlights a trend among users prioritizing specific tasks over broader functionalities.
OpenAI ▷ #api-discussions (1 messages):
o3-mini, owl-palm tree riddle
- Can o3-mini crack the owl-palm tree riddle?: A member expressed interest in whether o3-mini will be able to solve the owl-palm tree riddle, which they consider a significant benchmark.
- That's the only benchmark I care about!
- Discussion on o3-mini importance: There was a brief discussion about the significance of benchmarks for o3-mini, specifically focusing on its ability to address unique riddles like the owl-palm tree scenario.
- Members seem to view this riddle as a litmus test for o3-mini's capabilities.
LM Studio ▷ #general (247 messages🔥🔥):
DeepSeek R1 Models, LM Studio Functionality, RAG Implementation and Performance, User Experience with LLMs, Learning Resources for LLM Optimization
- Exploring DeepSeek R1 Models: Users are discussing the performance and pricing of DeepSeek R1 and its distilled versions, comparing them to alternatives like Qwen models for coding tasks.
- The conversation reflects on the effectiveness of various models based on their ability to provide quality responses at different price points.
- Issues with Model Loading in LM Studio: Several users report difficulties in loading models and queries about system requirements for LM Studio, particularly for those using Macs.
- Suggestions for changing settings and guardrails to mitigate loading issues are provided, highlighting the importance of meeting system specifications.
- Functionality of RAG in LM Studio: Members are questioning the performance of RAG implementations within LM Studio and their reliance on model quality for effective document handling.
- Discussion includes user experiences with RAG and the challenges posed by static models in answering specific queries.
- Learning to Optimize Local LLMs: A user seeks guidance on optimizing the use of local LLMs in LM Studio due to limited technical background and concerns regarding data privacy.
- The discussion emphasizes the need for beginner-friendly resources to assist healthcare professionals in leveraging LLMs effectively.
- Accessing Models and Beta Versions: Users are navigating issues related to model visibility in LM Studio, with recommendations to try beta versions for enhanced features and proxy support.
- Conversations highlight the importance of updating to the latest version to access new models and functionalities.
- no title found: no description found
- Fallout Tv Codsworth GIF - Fallout tv Fallout Codsworth - Discover & Share GIFs: Click to view the GIF
- Feature matrix: LLM inference in C/C++. Contribute to ggerganov/llama.cpp development by creating an account on GitHub.
- bartowski/Qwen2.5-7B-Instruct-1M-GGUF at main: no description found
- ggml/docs/gguf.md at master · ggerganov/ggml: Tensor library for machine learning. Contribute to ggerganov/ggml development by creating an account on GitHub.
- Qwen/Qwen2.5-7B-Instruct-1M · Hugging Face: no description found
- Import Models | LM Studio Docs: Use model files you've downloaded outside of LM Studio
- Qwen/Qwen2.5-7B-Instruct-1M · Hugging Face: no description found
- bartowski/Qwen2.5-7B-Instruct-1M-GGUF · Hugging Face: no description found
LM Studio ▷ #hardware-discussion (152 messages🔥🔥):
LLM Inference Speed, Hardware Requirements for DeepSeek, Using Models on Apple Silicon, Performance of ML Models, Handling CSV Data with LLMs
- LLM Inference Speed Affected by Memory Bandwidth: A member noted that LLM inference speed heavily depends on memory bandwidth, pointing out that Macs have bandwidth lower than A4000 and even 3060 GPUs.
- LM Studio users observed slow performance with the existing models.
- Hardware and Model Requirements for DeepSeek: Several users discussed the hardware requirements for running models like DeepSeek R1 Distill-Qwen 70B on various setups, with recommendations focusing on GPUs with at least 12GB of VRAM.
- Comments highlighted the capability of Threadripper and EPYC CPUs for handling multiple GPUs and larger models efficiently.
- Testing DeepSeek on Apple Silicon Models: A member inquired whether they could run DeepSeek on a MacBook Pro with 64GB RAM, suggesting they were primarily using CPU resources.
- Discussions indicated that while RAM plays a role, GPU utilization is crucial for optimal performance.
- Model Performance Comparisons: Users shared experiences regarding different models like DeepSeek R1 and Qwen 2.5, noting performance variances across configurations.
- It was suggested that the choice of model impacts speed and accuracy, with members recommending testing smaller models for everyday tasks.
- Handling CSV Data with LLMs: A member expressed interest in using an LLM to format CSV transactions for uniformity, considering the complexity of cross-chain data.
- Responses emphasized scripting in Python for reliable data processing, especially for larger datasets.
aider (Paul Gauthier) ▷ #general (329 messages🔥🔥):
DeepSeek API Issues, Qwen 2.5 Max, Sonnet as Editor, Test Driven Development (TDD), Pricing and Spending on AI Models
- DeepSeek API Issues: Several users reported significant downtime and performance issues with the DeepSeek API, prompting a shift back to other models such as Sonnet.
- OpenRouter providers like Nova appeared expensive to use, further emphasizing the frustrations with DeepSeek's reliability.
- Qwen 2.5 Max Confusion: There was confusion about Qwen 2.5 Max being open source, but it is not available for local use due to high GPU RAM requirements.
- Some users expressed interest in how to effectively implement Qwen 2.5 Max and integrate it into their coding workflows.
- Sonnet as Editor: Many users migrated back to Sonnet for its reliability and speed, despite the associated costs and occasional latency issues.
- Sonnet has been favored for tasks requiring consistent performance in coding and editing, contrasting with experiences from other providers.
- Test Driven Development (TDD): Users discussed the benefits of Test Driven Development (TDD) as a methodology, focusing on writing tests before developing code to ensure quality.
- The integration of AI tools like Aider with TDD practices appears to enhance productivity among users who actively utilize these methodologies.
- Pricing and Spending on AI Models: Users shared their spending experiences, with one noting expenditures of approximately $50 per month on AI tools, primarily when using Sonnet.
- Concerns about maintaining expenses while experimenting with AI models were common, with users expressing unease over rising costs during extensive usage.
- Ollama: aider is AI pair programming in your terminal
- Alternative DeepSeek V3 providers: DeepSeek’s API has been experiencing reliability issues. Here are alternative providers you can use.
- Sonar Reasoning - API, Providers, Stats: Sonar Reasoning is a reasoning model provided by Perplexity based on [Deepseek R1](https://openrouter.ai/deepseek/deepseek-r1). Run Sonar Reasoning with API
- Tweet from Qwen (@Alibaba_Qwen): The burst of DeepSeek V3 has attracted attention from the whole AI community to large-scale MoE models. Concurrently, we have been building Qwen2.5-Max, a large MoE LLM pretrained on massive data and ...
- Qwen2.5 Max Demo - a Hugging Face Space by Qwen: no description found
- Alternative DeepSeek V3 providers: DeepSeek’s API has been experiencing reliability issues. Here are alternative providers you can use.
- DeepSeek R1 (free) - API, Providers, Stats: DeepSeek R1 is here: Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass. Ru...
- Tweet from TestingCatalog News 🗞 (@testingcatalog): BREAKING 🚨: Grok 3 will support reasoning! It will be able to expose its "thinking" process to the UI as well 👀Quoting Tibor Blaho (@btibor91) The standalone Grok web app now includes menti...
- Vegetto Ssj3 Dbz GIF - VEGETTO SSJ3 DBZ - Discover & Share GIFs: Click to view the GIF
- Bad Work Citizen GIF - Bad Work Citizen - Discover & Share GIFs: Click to view the GIF
- pluggy — pluggy 0.1.dev94+gf8aa4a0 documentation: no description found
- > 99% of the code in this PR [for llama.cpp] is written by DeekSeek-R1 It's defi... | Hacker News: no description found
- library: Get up and running with large language models.
aider (Paul Gauthier) ▷ #questions-and-tips (56 messages🔥🔥):
Aider context and file management, Model performance and speed, Using conventions for code style, Architect Mode workflow, Troubleshooting token limits
- Aider aids with file context using /add command: To add relevant context for editing, members shared how to use the
/add
command, specifying files directly withaider
in the terminal.- One noted that while the model prompts for file edits, sometimes it's unnecessary to edit files and the context remains even if you cancel a response.
- Concerns over model performance and speed: Some users reported subpar response times from hyperbolic's R1, with interactions taking longer than expected, sometimes over a minute.
- Calculations indicated an output of around 12 tokens per second, raising questions about their system resource utilization.
- Incorporating conventions for consistent coding style: A member highlighted the ability to create a conventions file to ensure consistent code style, such as always adding type hints and using specific methods.
- This can be implemented by uploading a
CONVENTIONS.md
file and loading it with/read
for guidance during the editing process.
- This can be implemented by uploading a
- Understanding Architect Mode and workflow: Users were uncertain about the workflow in Architect Mode, noting the prompt to edit files every time, which felt redundant.
- Feedback indicated a complexity in retaining previous conclusions when opting out of file edits, unlike using
/code implement above
.
- Feedback indicated a complexity in retaining previous conclusions when opting out of file edits, unlike using
- Troubleshooting token limits and access: A user shared frustrations regarding token limit errors, which were common when using certain models and questioned the actual supported limits.
- Resolution noted that an increase in usage tier could alleviate access issues, and relevant configurations were discussed for other providers.
- Aider LLM Leaderboards: Quantitative benchmarks of LLM code editing skill.
- Specifying coding conventions: Tell aider to follow your coding conventions when it works on your code.
- Alternative DeepSeek V3 providers: DeepSeek’s API has been experiencing reliability issues. Here are alternative providers you can use.
- aider/benchmark/README.md at main · Aider-AI/aider: aider is AI pair programming in your terminal. Contribute to Aider-AI/aider development by creating an account on GitHub.
aider (Paul Gauthier) ▷ #links (1 messages):
apcameron: Have a look at this project. https://github.com/huggingface/open-r1
Perplexity AI ▷ #announcements (2 messages):
Sonar Reasoning API, DeepSeek R1 on Mac App
- Launch of Sonar Reasoning API: Introducing the Sonar Reasoning API powered by DeepSeek's reasoning models, which enables chain-of-thought reasoning alongside real-time internet search and citations.
- This new offering is hosted in US data centers and is designed to protect users' privacy by not collecting or sharing API data.
- DeepSeek R1 Now in Perplexity Mac App: DeepSeek R1 has become accessible through a command shortcut in the Perplexity Mac App, available via an update from the Mac App Store.
- Users are encouraged to update the app to utilize this new feature as soon as possible.
Perplexity AI ▷ #general (316 messages🔥🔥):
DeepSeek R1 queries, Perplexity Pro subscription, Model availability and usage, API key and usage, Web and iOS features
- Increased Daily Limit for DeepSeek R1: Perplexity has increased the daily limit for Pro users to 50 DeepSeek R1 queries, with 5 queries available for non-Pro users.
- This change was announced by CEO Aravind Srinivas, indicating ongoing updates and enhancements.
- Daily Query Limits for Pro Users: Pro users currently have a limit of 100 R1 queries per day within Perplexity, making it a robust option for extensive usage.
- This is perceived positively among users as an increase in querying capacity.
- Clarification on R1 Model Parameters: The R1 model being used by Perplexity is confirmed to be the full 671B parameter model.
- This aligns with the expectations for enhanced performance and capability.
- API Key Usage in Playground: Users discussed the API key usage on the Perplexity Playground and noted its affordability for casual use, allowing nearly unlimited interactions.
- Despite functioning without a key, the API offers a structured approach to accessing additional capabilities.
- Upcoming Features for Web and iOS: Community members inquired about the timeline for the rollout of the 'agent' or 'cron functionality' for web and iOS platforms.
- Interest is high regarding new features as some users continue to leverage the functionalities available on Android.
- Tweet from TestingCatalog News 🗞 (@testingcatalog): BREAKING 🚨: Grok 3 will support reasoning! It will be able to expose its "thinking" process to the UI as well 👀Quoting Tibor Blaho (@btibor91) The standalone Grok web app now includes menti...
- R1+Sonnet set SOTA on aider’s polyglot benchmark: R1+Sonnet has set a new SOTA on the aider polyglot benchmark. At 14X less cost compared to o1.
- Tweet from Aravind Srinivas (@AravSrinivas): @julheetel8890 Full.
- Tweet from Denis Yarats (@denisyarats): @D_Twitt3r yup, this is a true R1, coming out soon
- Tweet from Paul Couvert (@itsPaulAi): Reasoning models are becoming the normSo a friendly reminder that OpenAI has released a guide to writing prompts for them.Few important points and example:
- DeepSeek R1: API Provider Performance Benchmarking & Price Analysis | Artificial Analysis: Analysis of API providers for DeepSeek R1 across performance metrics including latency (time to first token), output speed (output tokens per second), price and others. API providers benchmarked inclu...
- Tweet from Aravind Srinivas (@AravSrinivas): The daily limit has been increased from 10 to 25 DeepSeek R1 queries a day for Perplexity Pro users. Goal is to keep increasing this as we add more capacity! Enjoy.
- Tweet from Aravind Srinivas (@AravSrinivas): Well, even if you don’t care about your data going to China, I think it’s worth caring about not using a censored model that the DeepSeek app serves. And that’s why it’s worth using the R1 model on Pe...
- Tweet from John Coogan (@johncoogan): Of course that’s your contention. You just heard about DeepSeek two days ago. Just got done watching some 40-minute deep dive—Deirdre Bosa, probably. You’re going to be talking about how this complica...
- Tweet from Aravind Srinivas (@AravSrinivas): Number of DeepSeek R1 daily queries on Perplexity for Pro users has been increased to 50 a day, free to 5 a day. More updates coming shortly. Enjoy!
- Tweet from Aravind Srinivas (@AravSrinivas): Should Perplexity make the default model DeepSeek R1?
- Perplexity AI increases limit for daily DeepSeek R1 queries on its platform: Perplexity AI CEO announced the increased daily limit on X, with users praising the move and also asking for an even higher increase.
- Complexity: An enhanced version of Perplexity.ai that everyone has ever wanted.
- Reddit - Dive into anything: no description found
- Reasoning with o1: Learn how to use and prompt OpenAI's o1 model for complex reasoning tasks.
- Perplexity AI Deploys Chinese DeepSeek AI Model: Perplexity AI makes a self-hosted version of the Chinese DeepSeek R1 reasoning model available for use on its AI search engine
Perplexity AI ▷ #sharing (13 messages🔥):
Java 23 SDK Update, DeepSeek vs OpenAI O1, F-35 Fighter Jet Incident, Leafcutter Ants Cultivation, Alibaba's New Model
- Java 23 SDK Update Findings: A user shared details about updating the Java 23 SDK to Java 2, highlighting the rapid implementation in public service settings.
- The community discussed the efficiency of private corporations compared to public service adaptation processes.
- DeepSeek May Outperform OpenAI's O1: A YouTube video discussed how DeepSeek R1 may potentially surpass OpenAI's model O1, along with predictions for AI career success.
- The video also touched on the impact of sleeping pills on brain activity, prompting further exploration in the chat.
- F-35 Fighter Jet Crashes: There was an incident involving an F-35 fighter jet crash, raising concerns regarding military aircraft safety measures.
- Details about the crash and investigation have captured the attention of members, prompting discussions on military protocols.
- Leafcutter Ants Cultivation Practices: A conversation emerged around whether leafcutter ants actively cultivate fungi, revealing fascinating insights into their ecosystem roles.
- Members exchanged viewpoints and research references to better understand the symbiotic relationships in nature.
- Alibaba Introduces New Model: An interesting debate was sparked about Alibaba's new model, shared through this link, which may impact their market strategy.
- Users discussed the potential implications of this model on competition and innovation in tech.
Link mentioned: YouTube: no description found
Perplexity AI ▷ #pplx-api (10 messages🔥):
Sonar Reasoning performance, Feedback on reasoning search, Sonar model specifications, Issues with reasoning outputs, Sources and citations
- Sonar Reasoning receives props for functionality: A member highlighted that Sonar Reasoning is working well, specifically mentioning successful use in a video about MCP servers with real-time citations.
- Others noted mixed experiences, with one reporting a rejection error from Sonar indicating issues with message formatting.
- Questioning the underlying model of Sonar Reasoning: Members are inquiring whether Sonar Reasoning utilizes the full R1 model (671B) or a distilled version like Llama 70B.
- A user pointed out they suspect it's a bug related to model selection and mentioned moving their inquiry to the feedback channel.
- Feedback requested on reasoning search effectiveness: There were questions about the performance improvements of the new reasoning search compared to earlier models.
- One member observed that it doesn't seem to 'think' as much as expected, prompting a discussion on potential fixes.
- Seeking additional sources for answers: Another user inquired about obtaining more sources, possibly to enhance the quality of responses from Sonar Reasoning.
- They shared an image related to the topic but the response was minimal.
- Community curiosity about Perplexity's operations: Members expressed their curiosity about specifics of the Sonar Reasoning system, seeking insights from Perplexity staff.
- The desire for transparency indicates an engaged community looking for clarity on their tools.
Nous Research AI ▷ #general (298 messages🔥🔥):
MoE models performance, Nous Research funding, DeepSeek R1 availability, AI reasoning and output quality, Speculation on stock predictions
- MoE Models Require High Memory for Optimal Performance: When discussing Mixture of Experts (MoE) models, a user highlighted that memory size is crucial for performance, particularly on CPU setups, indicating the significance of optimization techniques.
- Users reflected on their experiences with varying setups, noting that having sufficient memory bandwidth can drastically improve token processing speeds.
- Nous Research's Funding Mechanisms: Members shared insights into how Nous Research is funded, mentioning VC investments, donations from companies using their models, and some revenue from merchandise sales.
- While merchandise sales are currently considered insignificant, the overall mixture of funding sources supports their extensive computing costs.
- DeepSeek R1 Now Available on Azure: DeepSeek R1 has been launched on both the Azure AI Foundry and GitHub, making it part of a vast portfolio of over 1,800 AI models for developers.
- This expansion reflects growing industry interest, positioning DeepSeek R1 as a robust option in enterprise AI solutions.
- Debate on AI Reasoning vs. Probabilistic Outputs: There was a discussion regarding whether AI's chain of thought outputs truly represent reasoning, with contrasting views on the reliability and meaning of confidence scores generated by models.
- Participants argued about the foundational differences between algorithms used in AI models and human reasoning capabilities.
- Exploration of AI's Capacity to Predict Stocks: Inquiries arose about the potential for AI models to accurately predict stock movements, suggesting that the unpredictability of the market may render such efforts challenging.
- Various members speculated on the limitations of both AI and human capabilities in stock trading, affirming it's a complex arena to navigate.
- Tweet from Nous Research (@NousResearch): no description found
- Minion Spitting Out Popcorn GIF - Popcorn Minions Popcorn Day - Discover & Share GIFs: Click to view the GIF
- EQTY Lab — Introducing Verifiable Compute: Certify and protect agentic AI workflows with the first auditable proofs of governance.
- Shop our products: Nous Research
- Tweet from Matthew Carrigan (@carrigmat): Complete hardware + software setup for running Deepseek-R1 locally. The actual model, no distillations, and Q8 quantization for full quality. Total cost, $6,000. All download and part links below:
- If AI is going to be revolutionary, we need to see the revolution, says Big Tech's Alex Kantrowitz: Big Technology’s Alex Kantrowitz and Alger’s Dan Chung, join 'Closing Bell' to discuss AI power demand and investing and the shifting sentiment around the te...
- DeepSeek R1 is now available on Azure AI Foundry and GitHub | Microsoft Azure Blog: DeepSeek R1, available through the model catalog on Microsoft Azure AI Foundry and GitHub, enables businesses to seamlessly integrate advanced AI.
Nous Research AI ▷ #ask-about-llms (6 messages):
Olama, Local AI Model Options, CLI vs GUI
- Installing Olama for Local Models: A member suggested installing Olama to run open source models like Mistral, Llama, or Deepseek distilled locally for a private assistant.
- The motivation behind using Olama is to fulfill user requirements for local and private assistance.
- Concerns about Olama's CLI: Another member pointed out that there are better options than Olama, criticizing its use of a Command-Line Interface (CLI).
- They highlighted that other programs might offer a more user-friendly experience with a built-in Graphical User Interface (GUI).
- Alternatives to Olama: The discussion included suggestions like KoboldCPP as an alternative to Olama for running models.
- They also mentioned LM Studio for those who are open to using closed source software.
Nous Research AI ▷ #research-papers (2 messages):
Mixture-of-Experts Models, Autonomy-of-Experts Paradigm
- MoE Models Need Better Expert Selection: The paper argues that the separation of the router's decision-making from the experts' execution leads to suboptimal expert selection in Mixture-of-Experts (MoE) models, which often underperforms compared to dense models.
- To address this issue, the authors propose a new paradigm called Autonomy-of-Experts (AoE) where experts autonomously select themselves based on their capacity to process inputs.
- Experts Rank Themselves for Processing: In the AoE framework, routers are removed and experts pre-compute internal activations for inputs, ranking themselves based on activation norms.
- Consequently, only the top-ranking experts move forward in the process while the others abort, enhancing the efficiency of expert utilization.
Link mentioned: Autonomy-of-Experts Models: Mixture-of-Experts (MoE) models mostly use a router to assign tokens to specific expert modules, activating only partial parameters and often outperforming dense models. We argue that the separation b...
Nous Research AI ▷ #interesting-links (1 messages):
tudorboto: "Intel i5/AMD Ryzen 5 or mightier", does an M3 go in the "mightier" category?
Nous Research AI ▷ #research-papers (2 messages):
Mixture-of-Experts models, Autonomy-of-Experts paradigm, Router decision-making in MoE
- MoE Models Struggle with Expert Selection: Current Mixture-of-Experts (MoE) models rely on a router for token assignment, which often leads to suboptimal learner effectiveness and expert selection.
- The need for improvement in expert selections and learning is underscored by the proposal of Autonomy-of-Experts (AoE), eliminating router dependency.
- Introducing Autonomy in Expert Selection: The Autonomy-of-Experts (AoE) paradigm allows experts to autonomously determine their suitability for processing input by evaluating their internal activation norms.
- In this approach, only the top-ranking experts proceed with further calculation, leading to potentially more efficient processing.
Link mentioned: Autonomy-of-Experts Models: Mixture-of-Experts (MoE) models mostly use a router to assign tokens to specific expert modules, activating only partial parameters and often outperforming dense models. We argue that the separation b...
Codeium (Windsurf) ▷ #discussion (87 messages🔥🔥):
Windsurf account issues, DeepSeek integration, Codeium extension setup, User experience concerns, Flex credits and pricing
- Windsurf account login problems arise: Many users reported being unable to log into their Windsurf accounts, with one mentioning a recurring 'Sign in failed' error indicating the language server hadn't started.
- Another user noted it seems to be a general issue affecting multiple members.
- DeepSeek integration discussion heats up: A user expressed frustration over the absence of DeepSeek R1 in Windsurf, threatening to switch to Cursor which currently supports this feature.
- It was noted that DeepSeek struggles with tool calling, indicating a complexity in effective integration.
- Queries on Codeium extension setup in VSCode: A user questioned whether the Codeium extension in VSCode was functioning properly, specifically if the chat could access selected text like GitHub Copilot does.
- Responses clarified that due to Microsoft's proprietary nature of VSCode, certain functionalities may be limited compared to other tools.
- Concerns about user experience with Windsurf: Multiple users expressed dissatisfaction with Windsurf's pricing and model modifications that alter functioning code, indicating frustration with ongoing bugs.
- Comments highlighted the significant loss of flow action credits due to error corrections, leading to regrets over subscription costs.
- Confusion over Flex credits in new accounts: Users questioned changes in their Flex credits after creating new accounts, noting discrepancies surrounding the number of free credits provided.
- Clarification revealed that it is a one-time trial gift, reducing initial expectations from previous offerings.
- Prompt Engineering - Codeium Docs: no description found
- Codeium Status: no description found
- Plans and Pricing Updates: Some changes to our pricing model for Cascade.
Codeium (Windsurf) ▷ #windsurf (193 messages🔥🔥):
Issues with Windsurf performance, Sonnet LLM criticism, Cascade functionality problems, User frustrations with AI assistance, Feedback on pricing and value
- Users report performance issues with Windsurf: Several members noted that Windsurf has been running slowly, particularly when typing in the chat interface and failing to edit files effectively.
- Users have also reported erratic behavior from the Sonnet LLM, citing a decline in coding effectiveness and reliability.
- Sonnet LLM under scrutiny: Many users expressed disappointment with the performance of the Sonnet LLM, feeling that it has become less capable of understanding prompts and completing tasks accurately.
- Some users have compared it unfavorably to other platforms, asserting that similar tasks are accomplished more efficiently through alternatives like Cursor.
- Cascade features raise concerns: Some users experienced trouble with Cascade's ability to modify files without losing context or creating errors, indicating potential flaws in its functionality.
- Feedback suggested that Cascade operates inconsistently, prompting users to resort to manual refactoring to avoid issues.
- Customer frustrations with AI assistance: Users expressed frustration over AI outputs, claiming that nonsensical or incorrect responses led to wasted effort and unexpected changes in their codebase.
- Requests for a credit revocation feature for low-quality AI responses indicate dissatisfaction with the current model's value.
- Concerns over pricing versus utility: With many users reporting ineffective results from Windsurf, there are calls for a reevaluation of the service's pricing, suggesting it should be lower given the quality of output.
- Comments highlight a desire for improvements to ensure that investments in the platform yield tangible benefits, particularly when working on complex projects.
- A Hand-Curated Shitpost Picture: no description found
- Codeium Status: no description found
- The Future of AI Code Editors with Kevin Hou (Codeium, Windsurf): Featuring Kevin Hou, Head of Product Engineering at Codeium, this episode covers the company's journey from GPU virtualization to creating a leading-edge AI ...
- The Future of Music Generation - AI Record Label: Pioneering the future of music with artificial intelligence. Join us in revolutionizing the music industry through cutting-edge AI technology.
- Build a Stunning SwiftUI iPhone App with Xcode and Windsurf | Full Tutorial: Learn how to create a beautiful iPhone app using SwiftUI, Xcode, and Windsurf in this step-by-step tutorial! Whether you’re a beginner or an experienced deve...
OpenRouter (Alex Atallah) ▷ #announcements (1 messages):
DeepSeek R1, Chutes, Perplexity's Sonar, Sonar-Reasoning
- DeepSeek R1 welcomes Chutes!: A new decentralized provider, Chutes, is offering a free endpoint for DeepSeek R1 at openrouter.ai. This adds more options for users looking to leverage the capabilities of DeepSeek R1.
- Exciting developments ahead as OpenRouter expands its provider lineup!
- Perplexity enhances Sonar models: Perplexity's Sonar, part of its latest model fleet, has received significant improvements making it more efficient for speed and cost. Check out sonar.perplexity.ai for details.
- A new version, Sonar-Pro, is expected to launch soon, promising even more capabilities!
- Meet Sonar-Reasoning!: Sonar-Reasoning is a specialized reasoning model built on DeepSeek's architecture, excelling at search and reasoning tasks. This new model aims to enhance user experience across various applications.
- Users can utilize similar functionalities across all models by leveraging the web search capabilities, as highlighted in the detailed announcement here.
Link mentioned: DeepSeek R1 - API, Providers, Stats: DeepSeek R1 is here: Performance on par with OpenAI o1, but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass. Ru...
OpenRouter (Alex Atallah) ▷ #general (277 messages🔥🔥):
OpenRouter User Experiences, DeepSeek Model Performance, Model Communication and Pricing, Image Generation Discussions, Translation Model Recommendations
- Users Experience Limitations with DeepSeek and Pricing: Several users reported difficulties with the performance of the DeepSeek v3 and translations for languages like Polish, indicating that results can be incorrect and lack context.
- Meanwhile, concerns were raised about OpenRouter's pricing structure, which users found expensive, citing 5% fees on API requests, which many believe could be lower.
- Image Generation Request from Users: Users expressed strong interest in integrating image generation capabilities, such as DALL-E or Stability AI, into the OpenRouter platform for enhanced functionality.
- Members noted that the addition of such features could attract more users and enhance the platform's utility.
- Device and Communication Issues with Models: Some users faced issues with models not responding correctly or returning empty tokens, suggesting the need for more robust handling of outputs when using the OpenRouter interface.
- Others inquired about the retrieval of lost responses due to request length limits, highlighting the importance of data accessibility.
- DeepSeek R1 Model Concerns: Users shared experiences regarding the R1 model's output, reporting inconsistencies when compared to the OpenAI models and discussing the limits of reasoning within the interface.
- The need for upgraded video support for models like Gemini was also mentioned by members as a priority.
- Translation Model Recommendations: Discussions emerged about the effectiveness of various translation models, with users finding that prompting in the target language led to better outcomes.
- Recommendations for alternatives like Grok and Claude were shared, with users noting their satisfaction with the clarity of system prompts.
- Limits | OpenRouter: Set limits on model usage
- Tweet from Olivier Depiesse (@carismarus): @OpenRouterAI Wen token?
- Hugging Face – The AI community building the future.: no description found
- Sonar Reasoning - API, Providers, Stats: Sonar Reasoning is a reasoning model provided by Perplexity based on [Deepseek R1](https://openrouter.ai/deepseek/deepseek-r1). Run Sonar Reasoning with API
- DeepSeek R1 (free) - API, Providers, Stats: DeepSeek R1 is here: Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass. Ru...
- Provider Routing | OpenRouter: Route requests across multiple providers
- Welcome to Inference Providers on the Hub 🔥: no description found
- GitHub - OpenRouterTeam/openrouter-runner: Inference engine powering open source models on OpenRouter: Inference engine powering open source models on OpenRouter - OpenRouterTeam/openrouter-runner
- Dynamics 365 Customer Voice : no description found
Interconnects (Nathan Lambert) ▷ #news (53 messages🔥):
DeepSeek Database Exposure, Dario Amodei's Thoughts on AI Models, Community Reactions to Model Performance, Analysis of R1 and R1-Zero Models, Concerns about AI Model Transparency
- DeepSeek's Database Exposed: Wiz Research reported the discovery of 'DeepLeak', a publicly accessible ClickHouse database of DeepSeek, revealing sensitive information including secret keys and chat messages.
- This incident raised alarms as users could potentially exfiltrate data and escalate privileges within the server.
- Dario Amodei's Controversial Insights: Dario Amodei shared his perspectives on export controls and AI's future, leading to mixed reactions, with some deeming it 'more cope than expected'.
- His suggestion that AGI could arrive in two years elicited skepticism and laughter from the community.
- Community Analyzes Model Performance: Discussion arose around claims that Claude 3.5 Sonnet was not distilled from a larger model, with community members expressing doubts about its training methodology.
- Members questioned the credibility of statements surrounding its performance and costs, stating 'wild he outright lied'.
- Debate Over R1 and R1-Zero Models: An analysis indicated that R1-Zero might be more important than R1, emphasizing the lack of hosting providers for this model variant for research purposes.
- The community expressed disappointment over R1 being the flagship model despite being 'nerfed for human consumption'.
- Concerns Over AI Model Transparency: Members expressed frustration at the potential needing to trust executives on model performance details since many models are not open-sourced.
- Comments highlighted the broader implications of this lack of transparency in achieving optimal results from AI models.
- R1-Zero and R1 Results and Analysis: An analysis of Deepseek's R1
- Tweet from Wiz (@wiz_io): This meant anyone could access logs containing actual chat messages, internal secrets, service data, and potentially exfiltrate data along with escalating privileges within the server.
- Tweet from Wiz (@wiz_io): BREAKING: Internal #DeepSeek database publicly exposed 🚨Wiz Research has discovered "DeepLeak" - a publicly accessible ClickHouse database belonging to DeepSeek, exposing highly sensitive inf...
- Tweet from Dario Amodei (@DarioAmodei): My thoughts on China, export controls and two possible futures https://darioamodei.com/on-deepseek-and-export-controls
Interconnects (Nathan Lambert) ▷ #ml-drama (24 messages🔥):
OpenAI's lockdown mode, Concerns over O3 launch timing, Meta's interest in DeepSeek, Grok3 development, Model pricing issues
- OpenAI in lockdown mode post-DeepSeek: A user confirmed that OpenAI is in full lockdown mode following developments surrounding DeepSeek and expressed surprise at being caught in their proprietary training process.
- $5k into my $5M reasoning model training run and they already caught me raises concerns about OpenAI's strict operational procedures.
- Uncertainty over O3 launch timing: There are worries that the launch of O3 may be delayed, given the stakes involved for OpenAI.
- The pressure is compared to that which Meta faces regarding its future product launches.
- Meta exploring DeepSeek for advertising: Meta Platforms is now evaluating the use of DeepSeek for its advertising products and has set up war rooms following concerns that it outperforms Llama 4.
- The DeepSeek freakout is ~real indicates that Meta is not alone in its concerns about competing technologies.
- Anticipation for Grok3's capabilities: A member expressed eagerness for Grok3, hoping it brings innovative results from qualified researchers in XAI.
- Rumors suggest that they are preparing a thinking model, which could imply significant advancements.
- Discussion on model pricing discrepancies: Users pointed out that model pricing might be misaligned, with all but 4o-mini perceived as mispriced by some.
- As one member noted, let’s see what 100k h100s gets you, reflecting both intrigue and concern about the resources being allocated.
- Tweet from Charles Packer (@charlespacker): can confirm OpenAI is in full lockdown mode post-DeepSeek$5k into my $5M reasoning model training run and they already caught me 😳⛓️🚔
- Tweet from Amir Efrati (@amir): Guess what?Now Meta itself is evaluating whether to use DeepSeek for advertising products.Quoting Amir Efrati (@amir) news: the DeepSeek freakout is ~real~Meta Platforms, worried DS is better than Lla...
Interconnects (Nathan Lambert) ▷ #random (41 messages🔥):
DeepSeek R1, Llama 4 Development, Grok 3 and O3-mini Release, ChatGPT Revenue Insights, Vulnerability Reports
- DeepSeek R1 Launch on Azure: DeepSeek R1 is now available on Azure AI Foundry and GitHub, providing a scalable platform with over 1,800 models for advanced AI integration.
- This launch enables businesses to access cutting-edge AI with adherence to SLAs, security, and responsible AI commitments backed by Microsoft.
- Potential Delays in Llama 4 Release: Rumors suggest that Llama 4 is being completely redone from scratch in light of DeepSeek, delaying its expected February release.
- Partners like Together reportedly only received vague updates about the delays, indicating significant changes in development.
- Grok 3 and O3-mini Release Timing: The timing for the releases of Grok 3 and O3-mini appears to be uncertain, with hints that they might be scheduled for January but internal discussions continue.
- Members are anticipating that DeepSeek's developments could disrupt planned releases, as OpenAI typically prefers Thursday drops.
- ChatGPT Pro Revenue Insights: Revenue from OpenAI's $200/month ChatGPT Pro has reportedly outpaced that from ChatGPT Enterprise, with annualized revenue exceeding $300M.
- This information suggests a strong user retention and demand for the Pro subscription model over the enterprise option.
- Critical Vulnerability Reported for DeepSeek: A user reported sending an email about a critical vulnerability that could expose sensitive data in DeepSeek, including potential API keys.
- The urgency of addressing this vulnerability is emphasized, alerting DeepSeek to act quickly to mitigate risks.
- Tweet from vibagor441 (@vibagor44145276): Anti-vagueposting: I think it is safe for me to now say this - Llama 4 is being completely redone from scratch. Yes, in light of DeepSeek. Partners like Together were only told vaguely that Llama 4 is...
- Tweet from OedoSoldier (@OedoSoldier): @teortaxesTex Well, they are being constantly attacked (from the US), so no one can test it now.https://mp.weixin.qq.com/s/y5UaoBa0kOY0N-wfBz_Udw
- Tweet from Kol Tregaskes (@koltregaskes): The @theinformation is reporting that Mira Murati's new company is called Thinking Machines Lab:
- Tweet from Q (@qtnx_): today i'm releasing a Sparse Autoencoder for DeepSeek-R1-Distill-LLama-70B, trained on a mix of chat and reasoning data.
- Tweet from xlr8harder (@xlr8harder): accidentally posted the version without gpt4o, here's the full graph
- Tweet from Alexandr Wang (@alexandr_wang): What does DeepSeek R1 & v3 mean for LLM data?Contrary to some lazy takes I’ve seen, DeepSeek R1 was trained on a shit ton of human-generated data—in fact, the DeepSeek models are setting records for t...
- Tweet from Tibor Blaho (@btibor91): The standalone Grok web app now includes mentions of "thinking", "thinking start time", "thinking end time", and "thinking trace" - preparations for Grok 3, a new reaso...
- Tweet from Aidan McLaughlin (@aidan_mclau): r1 scores #9 on aidanbench
- Tweet from xlr8harder (@xlr8harder): Continuing my investigation into US vs Chinese language models, I decided to check compliance rates to user requests to compose speech critical of government. I think models should generally comply wi...
- Tweet from H4x0r.DZ (@h4x0r_dz): Hello @deepseek_ai,I have sent an email to service@deepseek.com regarding a critical vulnerability that could allow attackers to access your database exposing sensitive data including API KEYS. I stro...
- Tweet from Stephanie Palazzolo (@steph_palazzolo): New w/ @amir: Revenue from OpenAI's $200/month ChatGPT Pro has surpassed rev from ChatGPT Enterprise.That means that Pro is making more than $300M annualized, as that's what Enterprise was gen...
- Nebius AI Studio: no description found
- no title found: no description found
- Tradition meets tech: Unitree robots dance at Spring Festival Gala: For more:https://www.cgtn.com/videoThe 2025 Spring Festival Gala showcased a groundbreaking performance that blended tradition with cutting-edge technology,...
- DeepSeek R1 is now available on Azure AI Foundry and GitHub | Microsoft Azure Blog: DeepSeek R1, available through the model catalog on Microsoft Azure AI Foundry and GitHub, enables businesses to seamlessly integrate advanced AI.
Interconnects (Nathan Lambert) ▷ #memes (10 messages🔥):
Flawed Benchmark Debate, Zizek Voice Interpretation, Colorful Language in AGI Discourse
- Flawed Benchmark Sparks Spicy Slack Thread: A spicy Slack thread emerged after an individual ran a flawed benchmark, leading to heated exchanges among team members.
- Members discussed the attachment revealing images from the debate, showcasing frustrations and humor in the responses.
- Teortaxes's Colorful Critique of AGI Culture: A user highlighted a tweet from Teortaxes that criticized SV AGI advocates using phrases like 'whorish mystifications' and 'degenerate rat-sphere lifestyles', sparking laughter in the chat.
- The discussion emphasized the colorful language used in AGI discourse, with one member humorously remarking that they read it in a Zizek voice.
- Zizek Voice Makes Everything Better: The concept of reading critiques in a Zizek voice was brought up, adding an entertaining layer to the analysis of AGI discussions.
- Members agreed that this whimsical interpretation enhances the enjoyment of Teortaxes's messages, bringing laughter to the chat.
Link mentioned: Tweet from Teortaxes▶️ (DeepSeek🐳 Cheerleader since 2023) (@teortaxesTex): tbh I hate SV AGI bros. Their whorish mystifications and clinging to minor technical secrets. Their creepy need to lull people into false sense of safety, then terrify with visions of AGI doom. Their ...
Interconnects (Nathan Lambert) ▷ #reads (35 messages🔥):
DeepSeek's Papers, Liang Wenfeng Interview, Mixture-of-Experts (MoE), Multi-Token Prediction (MTP), Deepseek v2 and v3 Papers
- Liang Wenfeng's Journey from Hedge Fund to AI Lab: Liang Wenfeng, former head of High-Flyer hedge fund, discusses his transition to CEO of Deepseek in a recent interview, outlining his strategy for AGI development and the importance of early GPU purchases.
- This interview is filled with insights about the AI landscape, making it a highly recommended listen for anyone interested in the evolution of AI research.
- DeepSeek's v3 paper sparks curiosity: Members are eager to dissect aspects of the DeepSeek v3 paper, particularly its reinforcement learning breakthroughs, while expressing surprise over omitted auxiliary losses for expert balancing.
- Discussions indicate a shared interest in the underlying mechanics of the Mixture-of-Experts and the implications of not utilizing these techniques in the latest version.
- Unlocking potential with Multi-Token Prediction: The implementation of Multi-Token Prediction (MTP) in v3 has not been prioritized by many inference frameworks yet is suggested to enhance token acceptance rates significantly.
- Members explore how MTP contributes to speculative decoding, expressing intrigue over its application and effects on inference processes.
- Diving deep into Mixture-of-Experts: Users share their insights about MoE model configurations, highlighting the shift to fp8 models for v3 while retaining important components in fp32 or bf16 for optimal performance.
- The community actively discusses the complexities of training extra MLP blocks for speculative decoding and the trade-offs involved in model efficiency.
- SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training: Supervised fine-tuning (SFT) and reinforcement learning (RL) are widely used post-training techniques for foundation models. However, their roles in enhancing model generalization capabilities remain ...
- Deepseek: From Hedge Fund to Frontier Model Maker : Part 2 of our AI Lab translation series
- Mixture-of-Experts (MoE) LLMs: Understanding models like DeepSeek, Grok, and Mixtral from the ground up...
Interconnects (Nathan Lambert) ▷ #posts (42 messages🔥):
DeepSeek's impact, OpenAI's formal math direction, LLMs as verifiers, Community engagement around reasoning models
- DeepSeek Gains Mainstream Attention: Mainstream news is buzzing about DeepSeek, reflecting its reach beyond tech circles as it was even discussed by a casual listener last night.
- One member emphasized that the people crave information, showcasing the growing public interest in advanced AI technologies.
- OpenAI's Focus Shift Away from Formal Math: It was noted that OpenAI has paused its direction towards formal math solutions and lacks current plans to revisit this path.
- Others shared insights suggesting that internal opinions vary, with some members advocating for focusing on formal verifiers despite the team's overall hesitation.
- LLM Verifiers May Be Enough for Math Tasks: A member believes that using LLMs as verifiers might be sufficient for validating complex math problems, despite limitations in generating them.
- Discussion pointed out that models could potentially verify solutions to competitions like AIME accurately if prompted correctly, even if their solving abilities lag behind.
- Mysteries of Science and AI: A comment likened the complexities of OpenAI's approaches to a tantalizing blend of bitter pills and umami flavors, indicative of their current challenges.
- This metaphor reflects frustrations with navigating the often opaque methodologies within AI research.
- Community Buzz on Reasoning Models: Members are actively discussing the hot topic of reasoning models, particularly in the context of upcoming posts and ongoing community engagement.
- The excitement is palpable as they share a thread discussing reasoning models' potential, emphasizing the necessity of giving the people what they want.
Link mentioned: Tweet from Nathan Lambert (@natolambert): Why reasoning models will generalizeDeepSeek R1 is just the tip of the ice berg of rapid progress. People underestimate the long-term potential of “reasoning.”https://buff.ly/4haoAtt
Interconnects (Nathan Lambert) ▷ #policy (24 messages🔥):
DeepSeek IP Concerns, ChatGPT Token Usage, Inference Cost of AI Models, Export Restrictions on AI Chips
- DeepSeek may have copied OpenAI's methods: AI Czar David Sacks suggested that DeepSeek appears to have used a technique called distillation to extract knowledge from OpenAI's models, with substantial evidence supporting this claim.
- DeepSeek reportedly trained on a significant amount of ChatGPT tokens, prompting investigations from Microsoft and OpenAI into potential unauthorized use.
- ChatGPT influence on Llama's reasoning: Discussions emerged about whether Llama is influenced by ChatGPT, with some suggesting its output may resemble ChatGPT itself due to distillation during pretraining.
- One user noted that if substantial distillation had occurred, ChatGPT's stylistic influence would likely be more pronounced.
- Discussions on Inference Costs: Amid talks about DeepSeek's potential use of the Ascend 910b, there were doubts about its viability due to hardware limitations, like memory capacity and processing power.
- Concerns were raised on whether other high-performance alternatives, such as the H800, would be more suitable for their needs.
- White House Export Restrictions Considered: Reports surfaced that the White House might expand export restrictions to include Nvidia H200s, as the implications of these powerful chips become more significant.
- The ongoing discussions around AI export policies reflect broader concerns about technological dominance and security.
- Skepticism Towards DeepSeek's Capabilities: There is a prevalent skepticism regarding DeepSeek's abilities, with some suggesting that assumptions made about them lack strong evidence.
- Users expressed distrust about allegations surrounding DeepSeek, questioning motives behind claims while acknowledging their potential technical prowess.
- Tweet from Andrew Curran (@AndrewCurran_): Bloomberg is reporting that the White House is considering potentially expanding export restrictions to cover Nvidia H20's.
- Tweet from Tsarathustra (@tsarnick): Asked if China's DeepSeek stole American IP, AI Czar David Sacks says it looks like a technique called distillation was used where a student model can "suck the knowledge" out of the paren...
- Microsoft Probing If DeepSeek-Linked Group Improperly Obtained OpenAI…: no description found
Cursor IDE ▷ #general (219 messages🔥🔥):
DeepSeek Updates, Cursor IDE Bugs, Model Comparisons, Usage Limitations, User Experiences
- DeepSeek Controversy: Discussions highlighted concerns about the DeepSeek model's inability to generate code due to token limits, leaving users frustrated with its performance.
- “It keeps yapping then it cannot generate a code due to token limit,” one user lamented, while others discussed switching to official APIs for better reliability.
- Cursor IDE Bugs After Update: Multiple users reported issues with the Cursor IDE post-update, particularly with tab completion and copying behavior that included unwanted imports.
- One user noted, “Cursor no longer displays its markdown output correctly,” indicating ongoing challenges following the latest version update.
- Claude 3.5 and Usage Limits: Concerns were raised about the limitations of the free tier for Claude 3.5, particularly regarding the stopping of service after reaching 50 slow premium requests.
- One user questioned if there was a cooldown period after hitting limits, but responses indicated that it does not allow for continued access post-limit.
- User Feedback and Suggestions: Users actively engaged in providing feedback on features they would like to see enhanced in Cursor, specifically in the context of AI model usage.
- One user suggested that adding more models to agent mode could improve the functionality available for developers.
- Reporting Bugs and Support Channels: A user reported a specific issue regarding Sonnet 3.5 not working while using the Cursor subscription but functioning with a personal API key.
- The community encouraged users to report bugs on the Cursor forum, with guidance on finding support for troubleshooting.
- no title found: no description found
- Settings | Cursor - The AI Code Editor: You can manage your account, billing, and team settings here.
- Tweet from Ihtesham Haider (@ihteshamit): BREAKING: Alibaba just launched "Qwen" an AI model that writes, generates images/videos, and does web search.It outperforms DeepSeek, ChatGPT-o1, and Claude sonnet.Here are 5 insane examples o...
- Tweet from ian (@shaoruu): what's one thing you really want added to @cursor_ai composer, or just to cursor in general? open to all kinds of ideas :)
- Sonnet 3.5 stops working: When I enable the OpenAI API key but not Anthropic, it still tries to do a custom API call to the server. I expect it to only do OpenAI models and not Anthopic. If I disable Openai it does work on ant...
- Upgrade to 0.45.X always breaks cursor: Yep, it broke my installation. Today I wanted to open it and I got ‘Error [ERR_MODULE_NOT_FOUND]’. Downloaded the .exe again and could reinstall with no issues and now I’m running the latest version.
- Tweet from Pliny the Liberator 🐉 (@elder_plinius): I'm actually crying...this is one of the most beautiful things I've ever seen in my lives 🥹PROMPT:"""Research what Pliny the Liberator @elder_plinius talks about liberating DeepSe...
- Cursor Status: no description found
- Tweet from ian (@shaoruu): what's one thing you really want added to @cursor_ai composer, or just to cursor in general? open to all kinds of ideas :)
- no title found: no description found
Yannick Kilcher ▷ #general (180 messages🔥🔥):
Softmax Variations, Deep Reinforcement Learning Challenges, RTX 5090 Release Discussions, Performance Metrics in AI, Community Engagement Issues
- Exploring Softmax Variations: A member discussed a new approach to Softmax that might improve model performance, suggesting it could lead to new state-of-the-art results.
- The conversation included insights on how conventional Softmax can lead to noisy accuracy and suboptimal learning in certain scenarios.
- Challenges in Deep Reinforcement Learning: Discussion highlighted that traditional Softmax may not be suitable for deep RL, as it can hinder effective learning and contribute to mode collapse.
- Members advocated for the need for more flexible methods in reinforcement learning to enhance learning efficiency and model performance.
- Anticipating the RTX 5090 Launch: Chat participants noted that people were already lining up for the release of the RTX 5090, indicating high demand and excitement.
- This sparked conversations about consumer interest and market trends related to new GPU releases.
- Evaluating Performance Metrics in AI: A member ran tests comparing the accuracy and loss of their Softmax variations, finding that while accuracy improved, stability suffered.
- Visual representations shared indicated that regular Softmax had an easier time finding simpler decision boundaries compared to the new methods.
- Community Engagement Barriers: Concerns were raised about the impact of certain members on the community atmosphere, with some feeling discouraged about returning due to the interactions.
- A call for better management of community discussions was suggested to keep the space welcoming for serious professionals.
- Capy Turtule GIF - Capy Turtule Capybara - Discover & Share GIFs: Click to view the GIF
- No One Asked GIF - No One Asked - Discover & Share GIFs: Click to view the GIF
- Tweet from Sam Altman (@sama): visited @Helion_Energy today.the machine is making rapid progress (and the scale is nuts)--it feels like walking through a sci-fi movie!
- Tweet from Sam Altman (@sama): next phase of the msft x oai partnership is gonna be much better than anyone is ready for!!
- - YouTube: no description found
- Lena's Reversing for Newbies: A collection of tutorials aimed particularly for newbie reverse engineers. 01. Olly + assembler + patching a basic reverseme 02. Keyfiling the reverseme + assembler 03. Basic nag removal + header prob...
- Jaqen H'ghar Season 5 Compilation: If you haven't seen season 5, this video is obviously filled with SPOILERS.Every scene of season 5 that has Jaqen and Arya in it.---NO COPYRIGHT INFRINGEMENT...
- Christian Bales whacks Everyone with Style | Equilibrium Best Fights 🌀 4K: ✔️ Follow us on Facebook ➤ https://www.facebook.com/204568612956950📢 New Movies 2023 ➤ https://www.youtube.com/playlist?list=PLaARvwn7BsAHvhahR0x8FHz9knp1...
Yannick Kilcher ▷ #paper-discussion (15 messages🔥):
DeepSeek claims, OpenAI vs DeepSeek, Data usage controversy, Model distillation debates, Cyber attack implications
- DeepSeek Faces Allegations Post Cyber Attack: A member raised questions about the claims against DeepSeek, suggesting they could be distilled into one meme here. Another noted the timing of these claims right after a massive cyber attack, implying “uncle sam is pissed off”.
- OpenAI's Narrative on DeepSeek's Success: Reports indicate that OpenAI and Microsoft are implying that DeepSeek's success is due to their alleged unfair use of OpenAI's data, raising the stakes in the ongoing competition in AI. This narrative mirrors past claims of misappropriated data and has been made more complex by the media scrutiny from both Bloomberg and the Financial Times.
- Distillation's Role in AI: A conversation arose questioning the effectiveness of distillation in AI, with skepticism expressed about its narrowed capabilities. Members speculated that regardless of potential data overlap, the focus should remain on the models' outcomes.
- Smear Job or Genuine Concern?: There were varied opinions on the allegations against DeepSeek, with one member asserting it seemed more like a smear job than a legitimate concern. They pondered the implications of any data used, asking, “so what?” if some data did come from another model.
- The Bully Reaction in AI Conflict: Discussions highlighted the irony of the situation, likening OpenAI’s reaction to that of a bully upset when another entity fights back. The exchange drew parallels to competitive dynamics, emphasizing a complex rivalry in the AI landscape.
Link mentioned: OpenAI Furious DeepSeek Might Have Stolen All the Data OpenAI Stole From Us: OpenAI shocked that an AI company would train on someone else's data without permission or compensation.
Yannick Kilcher ▷ #agents (3 messages):
PydanticAI, Qwen2 VL performance, Multimodal model advantages
- PydanticAI showcased in Python code: A user presented a code snippet demonstrating the functionality of
PydanticAI
with theGroqModel
for filling user data based on input text.- The implementation showcased the integration of Pydantic for data validation while working seamlessly with the agent.
- Qwen2 VL Zooms Ahead: A member expressed excitement about the performance of Qwen2 VL, noting it runs exceptionally fast, especially with quantized 8K on the 7B M1 Chip.
- Tokens pour out like crazy, highlighting the efficiency and speed of this model.
- Discussion on Switching to Multimodal Models: There was a consensus that transitioning to a multimodal model could be beneficial given the impressive speed advantages noted in recent discussions.
- This shift aligns with the exploration of both R1 and multimodal advancements in artificial intelligence.
Yannick Kilcher ▷ #ml-news (14 messages🔥):
DeepSeek AI technologies, O3-mini launch, AI computing trends, Claude 3.5 training cost, Italy's regulation on AI
- DeepSeek shakes up the AI landscape: DeepSeek achieved a breakthrough by training its Mixture-of-Experts model using 671 billion parameters in just two months with 2,048 Nvidia H800 GPUs and showing 10X higher efficiency than competitors.
- This innovative approach utilized assembly-like PTX programming instead of conventional CUDA, signaling potential shifts in AI development strategies.
- O3-mini promises significant improvements: The launch of O3-mini is set for tomorrow, boasting to be 4x faster than the O1-mini and smarter overall compared to R1, potentially benefiting OpenAI and the US market.
- This development has sparked speculation about insider knowledge regarding its performance advantage over existing models.
- AI democratization in the spotlight: According to a recent analysis, AI is becoming as integral as oil and electricity, with expectations that computing power continues to become exponentially cheaper.
- As a result, large AI data centers are anticipated to depreciate in value, leading to widespread ownership of powerful AI on personal devices.
- Claude 3.5's hefty training expenses: Training the Claude 3.5 sonnet model reportedly cost a few tens of millions, highlighting the financial investments needed to develop cutting-edge AI technologies.
- This high cost reflects the ongoing trend of increasing investments in AI training and model development.
- Italy takes action against DeepSeek: Italy's regulatory body is seeking information regarding DeepSeek's compliance with data protection laws, indicating scrutiny on AI's operational frameworks in the country.
- It was noted that DeepSeek is currently unavailable as a cellphone app in Italy, adding to the discourse on AI governance.
- AI research team claims to reproduce DeepSeek core technologies for $30 — relatively small R1-Zero model has remarkable problem-solving abilities: It's cheap and powerful.
- DeepSeek's AI breakthrough bypasses industry-standard CUDA for some functions, uses Nvidia's assembly-like PTX programming instead: Dramatic optimizations do not come easy.
- Janus Pro WebGPU - a Hugging Face Space by webml-community: no description found
- Tweet from Bindu Reddy (@bindureddy): o3-mini is smarter than o1 and is about 4x faster than o1-mini!This would make it a MUCH BETTER model than R1 which is already significantly behind O1O3-mini launches tomorrow ADVANTAGE OpenAI and US!
- Tweet from Jürgen Schmidhuber (@SchmidhuberAI): It has been said that AI is the new oil, the new electricity, and the new internet. And the once nimble and highly profitable software companies (MSFT, GOOG, ...) became like utilities, investing in n...
- Italy Angry Italian Noises GIF - Italy Angry Italian Noises - Discover & Share GIFs: Click to view the GIF
Eleuther ▷ #general (60 messages🔥🔥):
Mordechai Rorvig's Book Project, Protein-Ligand Binding Research, Test Time Compute Models, Generative Models for Molecules, DeepSeek Architecture and Inference Framework
- Mordechai Rorvig shares his neuroscience book: Mordechai Rorvig presented his book project investigating how deep neural networks may model large-scale brain functions, particularly expressing concerns about emotional processing in AI.
- He encourages feedback on his work and shared a link to his free book and fundraiser.
- Protein-Ligand Binding Insights Needed: A researcher discussed their work on protein-ligand binding and sought advice on predictive and generative modeling techniques, focusing on IC50 predictions for novel ligands.
- They mentioned exploring existing models and shared their limitations, emphasising their willingness to gather more data and refine their approach.
- Discussion on Test Time Compute Models: Members debated the implications of test time compute models, with some suggesting that definitions may have shifted over time, particularly regarding generating sequential tokens.
- Questions arose about the role of auxiliary models and whether current implementations integrated tree search capabilities.
- Utilizing Generative Models in Chemistry: Fessus mentioned a generative embedding model that learns from molecular structures to suggest new molecules and discussed its potential application in ligand binding research.
- Although still under development, he expressed openness to collaboration if the dataset showed sufficient similarity among ligands.
- DeepSeek Architecture Reference: Participants noted the importance of inference frameworks like DeepSeek and their potential impact on generating and refining model outputs.
- Discussions highlighted various approaches like fine-tuning for generating longer chains of thought and the absence of specific tree search implementations.
- SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training: Supervised fine-tuning (SFT) and reinforcement learning (RL) are widely used post-training techniques for foundation models. However, their roles in enhancing model generalization capabilities remain ...
- Events & Training – AISafety.com: AI safety gatherings and training programs, both online and in-person.
- | bioRxiv: no description found
Eleuther ▷ #research (82 messages🔥🔥):
High Update Ratio Tricks in RL, Min-P Sampling Method, Exploration vs. Exploitation in RL, Fastfood Transform in Kernel Methods, Generalization in SFT vs. RL
- Discussion on High Update Ratio in RL: A paper from around a year ago discussed high update ratio tricks for reinforcement learning, sparking a search for its details in the channel.
- Members shared links and insights about policy updates and advantages in reinforcement learning contexts.
- Introduction of Min-P Sampling: A new sampling method called min-p was introduced, which aims to improve text quality and diversity for LLMs by dynamically adjusting the sampling threshold based on model confidence.
- Concerns were raised about whether such techniques may hinder exploration by limiting token diversity.
- Exploration Strategies in RL: Discussion about how exploration results can be used as training targets was initiated, relating to how some methods avoid traditional algorithms like PPO and TRPO by using revised exploration results as feedback.
- The implications of this approach on GRPO methodologies were noted as an ongoing area of inquiry.
- Efficiency of Fastfood Transform: The Fastfood method was highlighted for its ability to accelerate computations in kernel methods by effectively utilizing Hadamard and diagonal Gaussian matrices.
- This paper proposes significant improvements in computation time and storage, making it more feasible for large-scale problems.
- Generalization between SFT and RL: Participants debated the potential bias in the findings related to supervised fine-tuning (SFT) and reinforcement learning's (RL) effects on model generalization capabilities.
- Questions were raised about the impact of over-reliance on training datasets and the continual generation of new data in RL contexts.
- Fastfood: Approximate Kernel Expansions in Loglinear Time: Despite their successes, what makes kernel methods difficult to use in many large scale problems is the fact that storing and computing the decision function is typically expensive, especially at pred...
- Turning Up the Heat: Min-p Sampling for Creative and Coherent LLM...: Large Language Models (LLMs) generate text by sampling the next token from a probability distribution over the vocabulary at each decoding step. However, popular sampling methods like top-p...
- SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training: Supervised fine-tuning (SFT) and reinforcement learning (RL) are widely used post-training techniques for foundation models. However, their roles in enhancing model generalization capabilities remain ...
Eleuther ▷ #interpretability-general (55 messages🔥🔥):
Generalization Benchmarking, Sparse Autoencoders, Seed Dependency in ML Models, Robustness of Initialization, Mechanistic Permutability
- Seeking Generalization Benchmark Insights: A member expressed interest in a generalization benchmark, expecting that low rank structures could outperform traditional MLPs.
- This sparked a discussion about the challenges of generalization in machine learning frameworks.
- New Paper on Sparse Autoencoders Released: A new paper discusses the behavior of Sparse Autoencoders (SAEs) and their seed dependency, with observations that only 30% of features are shared across different seeds.
- The paper suggests that current approaches may not be the optimal choice for extracting replicable features.
- Questions on SAEs and Replicability: Members debated the appropriateness of SAEs for their tasks, with suggestions that alternate methods could improve replicability.
- It's highlighted that training two seeds simultaneously and encouraging similarity could lead to more consistent results.
- Initialization Effects in Training: The impact of initialization on SAEs was examined, with claims that infinite seeds would lead to convergence towards the original initialization.
- This raised questions about the mathematical assumptions made and the practicality of such claims in machine learning contexts.
- Clarifying Statistical Claims in Research: A member suggested that concepts referenced might distract from the main paper focus, advocating for greater clarity on statistical claims.
- Following this, another member confirmed they submitted a revised version of the paper, excluding certain claims and including references to relevant literature.
- Sparse Autoencoders Trained on the Same Data Learn Different Features: Sparse autoencoders (SAEs) are a useful tool for uncovering human-interpretable features in the activations of large language models (LLMs). While some expect SAEs to find the true underlying features...
- GLU Variants Improve Transformer: Gated Linear Units (arXiv:1612.08083) consist of the component-wise product of two linear projections, one of which is first passed through a sigmoid function. Variations on GLU are possible, using di...
Eleuther ▷ #gpt-neox-dev (5 messages):
Vocabulary Size Configuration, Intermediate Size Logic, Model Export Size Mismatch, Optimizer Configuration Issues
- Vocabulary Size Misconfiguration Troubles: A user expressed confusion over setting the vocabulary size to match the OLMo Paper's 50304, as it pads to 50432 instead. They suggested that using
make_vocab_size_divisible_by
as 64 would be more appropriate given their MP 2 setup. - Intermediate Size Logic Bewilderment: A member noted that the expected intermediate size should be 3x the hidden dimension but found it set to 32768, which is less than the calculated 33024 needed for a desired hidden size of 11008. Despite this discrepancy, the 32768 config worked, leading to confusion about the underlying logic.
- Config Size Mismatch Mystery During Export: Upon exporting a model trained with a hidden size of 32768, a member encountered a size mismatch error when the intermediate dim was not set to 11008 in the config. This unexpected behavior raised questions, as it didn't align with previous configurations.
- Optimizer Configuration Warning Clarity: A user reported repeated warnings in the logs indicating that APEX was not installed, defaulting to deepspeed's fused adam optimizer. This pattern persisted before the run hung, causing concern about the optimizer's actual configuration and performance.
- Config Hang Issues After Hours: Another user mentioned that a previously functioning config suddenly started hanging during execution without any noticeable changes. The last logs provided before the hang were warnings related to optimizer configuration, leaving the reason unclear.
Link mentioned: EleutherAI/gpt-neox: An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries - EleutherAI/gpt-neox
GPU MODE ▷ #general (12 messages🔥):
GPU Direct Storage, Tensor Weight Compression, Memory Snapshotting
- GPU Direct Storage utilizes PCIe peer-to-peer communication: GPU Direct Storage enables efficient data transfer from NVMe to GPU using PCIe peer-to-peer communication without involving the CPU.
- This approach raises questions about its support outside of DirectX DirectStorage, needing further verification.
- Exploration of Tensor Weight Compression: Discussion revealed that tensor weights might not compress well on disk due to data irregularity, although one user found weights reduced from 4.7 GB to 3.7 GB.
- While compression can yield some benefits, the effort may outweigh the rewards, as pointed out by community members.
- Potential for Parallel Compression Algorithms: A member proposed developing a parallel-friendly compression algorithm for weights to lessen data transfer needs, possibly decompressing directly on the GPU.
- This idea highlights innovative approaches to enhancing weight management, pending further exploration.
- Direct Loading of Safetensors to VRAM: It was noted that safetensors can load directly into VRAM from disk, suggesting an efficient method for managing tensor data.
- This capability could streamline workflows, but further investigation is needed to confirm functionality.
- Investigation into Memory Snapshotting: One participant plans to focus on memory snapshotting before assessing improvements in system performance, emphasizing a methodical approach.
- This commitment to exploring performance enhancements indicates a proactive stance in optimizing resource management.
- GitHub - NVIDIA/gdrcopy: A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology: A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology - NVIDIA/gdrcopy
- GitHub - gpudirect/libgdsync: GPUDirect Async support for IB Verbs: GPUDirect Async support for IB Verbs. Contribute to gpudirect/libgdsync development by creating an account on GitHub.
GPU MODE ▷ #cuda (19 messages🔥):
CUDA type punning, RTX Blackwell architecture, Memory alignment in CUDA, Memcopy performance optimization
- Memory Alignment in CUDA: A member highlighted that addresses in CUDA must be naturally aligned, emphasizing that misalignment leads to undefined behavior. The conversation suggested that loads should be aligned to 64 bits for optimal performance.
- Using memcpy for Type Punning: Rather than using reinterpret_cast, a member suggested using
memcpy()
for type punning between local variables, which compilers can optimize. Another member noted they were surprised the compiler could optimizememcpy()
into register operations. - RTX Blackwell Architecture Announced: The RTX Blackwell architecture paper revealed a 27% increase in FP16/32 throughput compared to the 4090, but no performance increase from 5th gen Tensor Cores over the 4th gen for consumer cards. Peak FP16/32 TFLOPS reported as 209.5, calculated based on previous generation metrics.
- Discussion on the 5090's Tensor Cores: Concerns were raised regarding the marketing of the 5th gen tensor cores for the RTX 5090, as it was pointed out that they are largely similar to 4th gen cores. The model supports fp4 and fp6, but there are questions about whether it includes microtensor scaling.
- Confusion Over RTX 5090 Microtensor Scaling: It was discussed whether the RTX 5090 supports microtensor scaling, with some noting the architectural improvements it could entail. The mma docs state that the
.block_scale
argument requires sm_120a, yet there is uncertainty about the 5090's sm version.
Link mentioned: NVIDIA GeForce RTX 5090 Graphics Cards: Powered by the NVIDIA Blackwell architecture.
GPU MODE ▷ #torch (8 messages🔥):
PyTorch on GB200s, Container availability for PyTorch, Merging PRs permissions, Scaled MM API
- Running PyTorch on GB200s Sparks Debate: A member queried whether PyTorch can run on GB200's, noting that reports suggest it needs to be built against CUDA 12.8.
- Another provided clarity saying that while building from source works, wheels do not currently support Blackwell yet.
- Request for Pre-Built PyTorch Containers: A user inquired about the availability of a container for PyTorch, expressing a desire for a simpler setup.
- Despite the inquiry, no definitive answer regarding existing containers was provided in the conversation.
- PR Merging Roles Clarified: A member asked if only collaborators or maintainers could merge PRs, raising concerns about authentication in certain workflows.
- The discussion hints at a need for clear guidelines on roles within the merging process.
- Help on Scaled MM API: A user expressed gratitude for a supportive writeup on torch._scaled_mm, mentioning they posted a question regarding the API.
- The previously linked Scaled MM API GitHub Gist provided additional details that were helpful.
Link mentioned: Scaled MM API: Scaled MM API. GitHub Gist: instantly share code, notes, and snippets.
GPU MODE ▷ #announcements (1 messages):
GTC 2023, CUDA Developer Meetup, Low Level Technical Track for CUDA Programming, GPU MODE Event at GTC
- GTC 2023 is approaching!: Mark your calendars—GTC 2023 runs from March 17-21 in San Jose with numerous in-person events lined up.
- Stay tuned for details on the exciting sessions and gatherings planned for this year's conference.
- CUDA Developer Meetup Tomorrow: Tomorrow, January 30, a CUDA Developer Meetup will take place, welcoming all levels of CUDA developers to engage and share ideas at AI Camp.
- Expect direct engagement with NVIDIA maintainers, collaborative discussions, and exciting giveaways including a chance to win a GPU!
- Low Level Track at GTC for CUDA Programming: NVIDIA has introduced a low-level technical track focused on CUDA programming at GTC, aiming to enhance skills in GPU-accelerated applications, as detailed on their GTC sessions page.
- These sessions will cover essential tools and training to maximize the performance of applications using NVIDIA CUDA.
- Rumors of GPU MODE Event at GTC: There are whispers of a GPU MODE event happening in-person during GTC, with more details promised soon.
- Anticipation grows as the community looks forward to potential announcements regarding this exciting gathering.
- CUDA Developer Meet Up with NVIDIA (Silicon Valley): no description found
- NVIDIA GTC AI Conference 2025: March 17–21, 2025. San Jose. Register Now.
GPU MODE ▷ #cool-links (40 messages🔥):
Tom Yeh's Multi-Head Attention Lecture, FP4 Training Framework for LLMs, Microscaling in DeepSeek, Llama Training Codebase
- Tom Yeh's Lecture on Multi-Head Attention: A member shared a link to Tom Yeh's lecture on Multi-Head Attention in relation to deep learning and computer vision.
- Another video was mentioned that explores insights on similar topics via NotebookLM.
- Innovations in FP4 Training for LLMs: A paper discusses a pioneering FP4 training framework for large language models that tackles quantization errors by introducing innovative methods for weight updates.
- Discussion revealed challenges and potential solutions regarding quantization along axes when scaling to block sizes beyond established limits.
- Microscaling Advantages in Computational Performance: The conversation emphasized that microscaling may negate some performance limitations seen in standard block sizes due to better management of scaled dot products.
- Members pointed out how group sizes impact performance and clarity around utilizing larger block sizes remains essential.
- Minimal Codebase for Llama Training Available: For those interested in training Llama models, a minimal codebase has been shared at speed_llama3.
- This repository aims to facilitate easier experimentation and implementation for Llama training.
- triton.language.dot_scaled — Triton documentation: no description found
- Optimizing Large Language Model Training Using FP4 Quantization: The growing computational demands of training large language models (LLMs) necessitate more efficient methods. Quantized training presents a promising solution by enabling low-bit arithmetic operation...
- S25 - Computer Vision - Special Public Lecture - DeepSeek: no description found
- DeepSeek-V3/inference/kernel.py at main · deepseek-ai/DeepSeek-V3: Contribute to deepseek-ai/DeepSeek-V3 development by creating an account on GitHub.
- GitHub - ahxt/speed_llama3: Contribute to ahxt/speed_llama3 development by creating an account on GitHub.
- Multi-Head Latent Attention and Multi-token Prediction in Deepseek v3: We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient i...
- ao/torchao/prototype/hqq/kernels.py at 7b0d2ce50baaa2a137eb9d438a076544c43096a3 · pytorch/ao: PyTorch native quantization and sparsity for training and inference - pytorch/ao
GPU MODE ▷ #beginner (10 messages🔥):
Working group suggestions, Training models for chess LLM, Collaborators on HF server, DiT training run completion
- Marksaroofim explains working group suggestions: A member clarified where to propose a working group, suggesting to either post in the linked channel or message directly for better success.
- He mentioned that initial activity is crucial for gathering interest and participation in a working group.
- Zeev5235's chess LLM proposal: Zeev5235 expressed a desire to create a working group focused on reimplementing an LLM for chess with rewards from a chess engine, seeking help with larger model training.
- Despite communication barriers, he showed strong interest in utilizing the mamba-falcon3-7b model and drawing from the expertise of larger model trainers.
- Marksaroofim fixes permission issue: Marksaroofim acknowledged the permission issue blocking Zeev5235 from suggesting a working group and indicated it has been resolved.
- He encouraged checking another channel for discussions on similar challenges faced in developing LLMs.
- Zeev5235 commits to DiT training: Zeev5235 mentioned that he plans to finish his DiT training run and then publish it, indicating his commitment to the working group.
- He highlighted the need for his training run completion as a prerequisite for fully engaging in the proposed collaboration.
GPU MODE ▷ #bitnet (1 messages):
leiwang1999_53585: we'll add some examples of bwd kernels 🙂
GPU MODE ▷ #self-promotion (1 messages):
Llama training, Minimal codebase
- Explore Minimal Codebase for Llama Training: A member shared a minimal codebase for Llama training available at speed_llama3.
- This resource aims to simplify and optimize the training process with a focus on efficiency.
- Improve Training Efficiency with Llama: Members discussed the importance of utilizing an efficient codebase for Llama training to achieve better results.
- A direct link to the GitHub repository emphasizes its valuable contributions to the AI development community.
Link mentioned: GitHub - ahxt/speed_llama3: Contribute to ahxt/speed_llama3 development by creating an account on GitHub.
GPU MODE ▷ #thunderkittens (1 messages):
Thunderkitten community enthusiasm, Hardware feature support requests, Distributed Shared Memory (DSM), Threadblock to SM scheduling, FlexAttention blog
- Thunderkitten community loves it!: A member expressed their love for Thunderkitten, mentioning they just gave a talk at a reading group about the excitement in the community.
- They highlighted the growing interest and engagement around the topic of Thunderkitten.
- Request for new hardware feature support: There is an open request for adding new hardware feature support to Thunderkitten, specifically for Distributed Shared Memory (DSM).
- Persistent kernels were suggested to enable better data reuse between SMs, acknowledging a generally under-explored design space due to software limitations.
- Background on Distributed Shared Memory: The member has a background in Distributed Shared Memory, having worked on it for about 2.5 years at NV during an internship.
- This experience lends credibility to their suggestions about its potential application within Thunderkitten.
- Threadblock to SM Scheduling Insights: The discussion mentions support for threadblock to SM scheduling, emphasizing its importance for efficient memory usage.
- The goal is to enhance performance by leveraging better data reuse techniques between shared memory systems.
- FlexAttention blog connection: The member identified as Joy Dong, associated with the FlexAttention blog, shared insights about their journey in the field.
- Their engagement in both the community and professional work showcases a strong passion for advancing technology in Thunderkitten.
GPU MODE ▷ #arc-agi-2 (89 messages🔥🔥):
Dynamic Evaluation in Reasoning Tasks, Chess Puzzles and Strategic Reasoning, Wikipedia Game Proposal, Explainability in AI, Utilizing Inference Engines for Training
- Dynamic Evaluation Enhancements: Members discussed the importance of implementing dynamic evaluation for reasoning tasks, emphasizing a response format that includes possible solutions for training data generation.
- A proposal for using standard formatting with metadata was put forward to streamline the collection of reasoning tasks.
- Chess Puzzles Under Consideration: There is ongoing interest in incorporating chess puzzles into reasoning challenges, particularly focusing on simplified formats that do not rely on complex dependencies like Stockfish.
- Considering a range of puzzles from mate-in-two scenarios to tic-tac-toe as potential task formats for future development.
- Innovative Ideas for the Wikipedia Game: One member proposed generating optimal solutions for the 'Wikipedia game' based on a deep analysis of link paths between random Wikipedia pages.
- This approach aims to combine strategy and understanding of associations to create a rich dataset for reasoning training.
- Exploratory Learning and Explainability: A new method was suggested to train explainer models that generate reasoning for answers, potentially improving the ability to tackle harder problems.
- The idea involves finetuning models in a way that allows them to provide insights into their reasoning processes and previous attempts.
- Utilizing Inference Engines in Training: Discussion highlighted the use of inference engines like vLLM and SGLang for managing dynamic batch processing during training sessions.
- Concerns were raised regarding the heaviness of updates to inference nodes, suggesting a balancing act between experience collection and timely training updates.
- andreaskoepf: Weights & Biases, developer tools for machine learning
- Add Figlet Fonts Challenges and Evaluator by Miserlou · Pull Request #22 · open-thought/reasoning-gym: Uses pyfiglet and the Worde wordlist to generate Figlet font deciphering challenges.ExLPlease read the following figlet font:###### ### ###### ##### ## #### ## ## ## ## ## ## ## ...
- Add Rubik's Cube Generator and Evaluator by Miserlou · Pull Request #21 · open-thought/reasoning-gym: Related: #20Initial attempt at adding a Rubik's Cube challenge, which uses a dynamic evaluation but provides an example possible solution for simple cases. It depends on the magiccube library....
- I Made a Graph of Wikipedia... This Is What I Found: Code for all my videos: https://github.com/sponsors/adumb-codes/Get the graph as a poster: https://adumb.store/Twitter: https://twitter.com/adumb_codesA deep...
- Implement answer-scoring for the countdown number game dataset (e.g. via sympy) · Issue #18 · open-thought/reasoning-gym: The countdown game dataset implemented in countdown.py asks to find a formula to combine a list of numbers with operators +,-,*,/ to produce a given target number. The generated questions in most c...
- Add a Rubic's Cube dataset · Issue #20 · open-thought/reasoning-gym: Create a task dataset which asks the user to provide the solution for a given Rubic's Cube configuration. Suggestion for parameters (to adjust the hardness of the task): cube_size: int # (e.g. 2 -...
- Tweet from Piotr Mazurek (@tugot17): @neurosp1ke I have an implementation of a gym env for Rubik’s cube. Would this be a good riddle?
- rubiks_cube.py: GitHub Gist: instantly share code, notes, and snippets.
- Jiayi Pan (@jiayi_pirate): We reproduced DeepSeek R1-Zero in the CountDown game, and it just works Through RL, the 3B base LM develops self-verification and search abilities all on its own You can experience the Ahah moment you...
- GitHub - Jiayi-Pan/TinyZero: Clean, accessible reproduction of DeepSeek R1-Zero: Clean, accessible reproduction of DeepSeek R1-Zero - Jiayi-Pan/TinyZero
Stability.ai (Stable Diffusion) ▷ #general-chat (120 messages🔥🔥):
ComfyUI vs Forge, Model Performance and Workflows, Image Generation Tools, User Interface Preferences, Character Generation in Stable Diffusion
- Debate on ComfyUI's Complexity: Users discussed the usability of ComfyUI, noting its complexity and the demand for a more streamlined user experience compared to Forge.
- While some users appreciate its flexibility for advanced tasks, others believe it complicates straightforward processes.
- Image Generation Workflows: The conversation shifted to how users manage workflows in different UIs, highlighting issues with customizing ComfyUI for specific tasks.
- Many participants expressed they prefer simpler options, wanting quicker access to features without needing intricate workflows.
- Recommendations for Models and Features: Users sought recommendations for various AI models, emphasizing the desire for functionality like image captioning and realistic character generation.
- Some mentioned using specific models like autismmix for generating fantasy themes but noted challenges in achieving desired results.
- User Interface Satisfaction: A split in satisfaction with the user interfaces of different platforms was observed, with some favoring the straightforwardness of Forge over ComfyUI.
- Participants mentioned the importance of having a balance between complexity and ease of access to settings.
- Technical Issues Encountered: One user reported issues related to installing Stable Diffusion, specifically needing help with Python errors.
- The group offered assistance by directing the user to support channels while discussing the overall state of installations.
- Kolors Virtual Try-On - a Hugging Face Space by Kwai-Kolors: no description found
- Wall GIF - Wall - Discover & Share GIFs: Click to view the GIF
- GitHub - lllyasviel/stable-diffusion-webui-forge: Contribute to lllyasviel/stable-diffusion-webui-forge development by creating an account on GitHub.
- no title found: no description found
Stackblitz (Bolt.new) ▷ #announcements (1 messages):
Bolt updates, Export and Import Handling
- Bolt ensures correct exports and imports: Starting today, Bolt's latest update guarantees that all imports and exports are functioning correctly, including previously missing default exports, as detailed in the announcement.
- This enhancement promises a smoother experience across all projects by ensuring that 'export default' functionality is now reliable and consistent.
- Smart Imports Update in Bolt: The update focuses on the less exciting yet crucial part of coding, ensuring 'export default' works as intended across the codebase. This improvement is live now on all projects, enhancing overall functionality.
Link mentioned: Tweet from bolt.new (@boltdotnew): Bolt 🧠 update: Smart Imports'export default' might not be the most thrilling part of your codebase. But it is important!The latest update in Bolt's engine ensures that all imports and exp...
Stackblitz (Bolt.new) ▷ #prompting (2 messages):
Backend suggestions, Firebase learning experience
- Asking for Backend Recommendations: A member inquired about which backend solutions are recommended for their project.
- They are looking for suggestions to support their development needs.
- Navigating Firebase's Learning Curve: Another member shared their experience working with Firebase, mentioning that it has been a steep learning curve for them.
- They indicated that although it’s challenging, they are gradually becoming more familiar with it.
Stackblitz (Bolt.new) ▷ #discussions (110 messages🔥🔥):
GitHub OAuth Disconnection, Bolt App Development Support, Token Usage in Bolt, Error Handling in Bolt, Custom Domains with Supabase and Netlify
- Disconnecting GitHub from Stackblitz: To switch GitHub accounts associated with Stackblitz, you need to revoke permissions from the OAuth settings in GitHub and delete your old Stackblitz account, with no other workarounds available.
- This information was confirmed amidst a discussion on possible methods for disconnecting accounts.
- Seeking Support for Bolt App Integrations: A developer reached out for help connecting functions between a Bolt app and Supabase, offering contract work for backend developers.
- Another user confirmed their availability to assist with the edge function needed for integration.
- Understanding Token Consumption in Bolt: Users expressed concerns about rapidly depleting tokens during the debugging process, especially when prompting Bolt for repeated fixes.
- The dynamic nature of token consumption, based on prompt length and project complexity, was discussed, with advice on how to manage expectations.
- Reported Errors and Service Outages in Bolt: Multiple users reported experiencing server errors and service availability issues while using Bolt, prompting frustration about the consistency of the platform.
- In some cases, users described how these errors impacted their workflow and project deployments.
- Using Custom Domains with Supabase and Netlify: A user inquired about using custom domains for email verification while facing conflicts between Supabase and Netlify regarding root CNAME records.
- While it was suggested that Supabase can operate without a custom domain, users noted the potential for cleaner email communications with one.
- Tweet from bolt.new (@boltdotnew): You can now open public repos in bolt․new 🙌How? For any GitHub URL, just put "http://bolt.new" in front of it!(Release notes below!)
- Vite + React + TS: no description found
- Uploading Files To GitHub Quick Start Guide: How to upload & clone files from GitHub remote repositories Article: https://dennisivy.com/github-quickstartGithub Crash course by Brad Traversy: https://you...
- Show Me Your Bolt: no description found
- Bolt.new Builders Hub: no description found
- Share Your AI Projects - I'm Built With AI: no description found
MCP (Glama) ▷ #general (74 messages🔥🔥):
Goose Client Impressions, MCP Server for Google Sheets, DeepSeek Integration, Ideal LLM Client Features, Collaborative Development of LLM Tools
- Positive Impressions of Goose Client: Members expressed enthusiasm about the Goose client, noting its CLI priority and integration with MCP servers for enhanced usage.
- However, some raised concerns regarding token usage and potential rate limits as highlighted in the documentation.
- MCP Server for Google Sheets Development: A member shared their GitHub project for an MCP server that integrates with Google Sheets and Drive, offering functionalities for reading and writing data.
- It currently cannot format complex structures like charts, but improvements could be made with further exploration.
- Challenges with DeepSeek Integration: Various members discussed their experiences integrating DeepSeek models with MCP, highlighting issues with tool calls and API behavior.
- Alternatives like Kluster.ai were recommended, which functioned better with DeepSeek without major issues.
- Features of an Ideal LLM Client: Members brainstormed the ideal features for an LLM client, including support for multiple models, chat management, and customizable interfaces.
- One member initiated a project aiming to incorporate many of these features and sought feedback from the community.
- Collaboration Among Developers: There was a call for collaboration amongst developers to create an MCP client that meets the needs of the community.
- Members expressed interest in developing tools together to speed up progress and enhance features.
- no title found: no description found
- deepseek-r1-distill-qwen-32b: DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.
- michaelneale/deepseek-r1-goose: Tool calling for deepseek-r1, tweaked for the goose agent
- Next.js AI Chatbot: A full-featured, hackable Next.js AI chatbot built by Vercel
- AI SDK by Vercel: The AI SDK is the TypeScript toolkit designed to help developers build AI-powered applications and agents with React, Next.js, Vue, Svelte, Node.js, and more.
- GitHub - isaacphi/mcp-gdrive: Model Context Protocol (MCP) Server for reading from Google Drive and editing Google Sheets: Model Context Protocol (MCP) Server for reading from Google Drive and editing Google Sheets - isaacphi/mcp-gdrive
- AI SDK Providers: Learn how to use AI SDK providers.
- Chatbot Tool Usage: Learn how to use tools with the useChat hook.
- GitHub - isaacphi/wheel: TUI LLM chatbot, code assistant, and MCP client: TUI LLM chatbot, code assistant, and MCP client. Contribute to isaacphi/wheel development by creating an account on GitHub.
- Reddit - Dive into anything: no description found
- no title found: no description found
MCP (Glama) ▷ #showcase (20 messages🔥):
Codename Goose, lüm AI for mental health, mcp-agent framework, Show HN trending, Google integration agents
- Introducing Codename Goose: A new open source MCP AI Agent, Codename Goose, has been released, exciting the community.
- Members are encouraged to explore its features and get involved.
- lüm: Your AI Companion for Mental Health: A member introduced lüm, a thoughtful AI companion for mental health practice, emphasizing its privacy-focused infrastructure.
- The platform encourages collaborative growth to shape future psychological tools.
- Building the mcp-agent Framework: A developer introduced the mcp-agent framework, crafted during their holiday, facilitating the building of effective agents by implementing established patterns.
- The project aims for collaboration and input on its roadmap, and invites contributions from the community.
- Show HN Success for mcp-agent: The mcp-agent project is currently trending #1 on Show HN, garnering attention and support from developers.
- Community members are encouraged to participate in discussions on HN to enhance visibility and engagement.
- Creative Use Cases for MCP Agents: A user proposed multiple agents for research tasks, including specialized integrations with Pubmed and Google Scholar.
- They expressed interest in building a system with various agents to streamline their research processes.
- lüm - Your AI Companion: A thoughtful AI companion designed specifically for mental health professionals
- mcp-agent/CONTRIBUTING.md at main · lastmile-ai/mcp-agent: Build effective agents using Model Context Protocol and simple workflow patterns - lastmile-ai/mcp-agent
- GitHub - lastmile-ai/mcp-agent: Build effective agents using Model Context Protocol and simple workflow patterns: Build effective agents using Model Context Protocol and simple workflow patterns - lastmile-ai/mcp-agent
Nomic.ai (GPT4All) ▷ #general (82 messages🔥🔥):
DeepSeek R1 Distill models, CUDA and CPU performance, Template optimization for DeepSeek, LM Studio usage, Acknowledge new R1 releases
- DeepSeek R1 Distill models discussed: Several members talked about the effectiveness of DeepSeek R1 Distill models, noting that these smaller models are based on the more substantial R1 architecture.
- While the 8b distill model performs admirably, it still seems limited compared to larger alternatives like the 70b quant.
- CUDA boosts performance: Users shared insights on running DeepSeek models with CUDA, indicating it could enhance performance when combined with CPU tasks.
- One member reported achieving 5t/s on CPU while using q8_0, prompting discussions about optimizing settings.
- Optimization of DeepSeek templates: There were conversations about the need for further optimization of templates used with DeepSeek, particularly for better performance and functionality.
- Members noted there's room for improvement in how the models render outputs and process user prompts.
- LM Studio's performance questioned: The use of LM Studio alongside DeepSeek was brought up with concerns about its closed-source nature and compatibility issues.
- Users pondered the benefits it provided in conjunction with local models, particularly for research purposes.
- Anticipation for 32b R1 distills: Members expressed a desire for working 32b R1 distill models, emphasizing that current options are not performing adequately for their needs.
- Some optimism was shared about future developments, with a mention that new releases could potentially address current performance shortcomings.
- alexandreteles/bonito-v1-gguf · Hugging Face: no description found
- bartowski/DeepSeek-R1-Distill-Llama-8B-GGUF · Hugging Face: no description found
- bartowski/DeepSeek-R1-Distill-Llama-8B-GGUF · Hugging Face: no description found
- unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF at main: no description found
- unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF · Hugging Face: no description found
- Deepseek R1 Explained by a Retired Microsoft Engineer: Dave explains why Deepseek R1 is such a big deal, explains how it works, what's new, and brings you up to date on the implications and fall out! Free Sample...
- Web Search Beta Release: GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use. - nomic-ai/gpt4all
- Deepseek R1 671b Running LOCAL AI LLM is a ChatGPT Killer!: Writeup for Deepseek R1 671b Setup and Running Locally https://digitalspaceport.com/running-deepseek-r1-locally-not-a-distilled-qwen-or-llama/768GB RAM or VR...
- no title found: no description found
- Support DeepSeek-R1 Qwen by cebtenzzre · Pull Request #3431 · nomic-ai/gpt4all: This PR adds DeepSeek-R1 Qwen support by:Rebasing llama.cpp on a slightly newer upstream commit (ggerganov/llama.cpp@a39ab216a from Oct 2 instead of ggerganov/llama.cpp@95bc82fbc from Sep 26)Che...
Notebook LM Discord ▷ #use-cases (11 messages🔥):
Using NotebookLM for Environmental Engineering, Risks of Using NotebookLM as a Repository, Converting Notes to Source for Data Comparison, Maximum File Size Limits in NotebookLM
- Inquiring About NotebookLM for Environmental Engineering: A user questioned if adding two lengthy textbooks and a dozen documents on environmental engineering to NotebookLM is feasible without exceeding limits.
- Another member noted that there are maximum file size limits, suggesting that large sources may complicate specific inquiries due to the 'needle in the haystack problem'.
- Concerns About Using NotebookLM as a Storage Repository: A member raised concerns about whether utilizing NotebookLM to store various educational materials poses risks compared to platforms like Google Drive.
- They were advised to also keep copies on Google Drive since NotebookLM does not allow access to downloaded originals after upload.
- Effective Use of Note Conversion Techniques: A user discovered a useful method of converting notes to sources in NotebookLM for comparing cohorts of unstructured data from surveys.
- By summarizing each source and converting them, they were able to better contrast the data, enhancing clarity in referencing different sources.
Link mentioned: Frequently Asked Questions - NotebookLM Help: no description found
Notebook LM Discord ▷ #general (70 messages🔥🔥):
NotebookLM Button Issues, Documentation Limit Queries, Audio Podcast Capabilities, Using LinkedIn Profiles as Sources, Translating Notes and Audio
- Disappearing 'Add New' Button Confusion: A user reported that the 'Add New' button in NotebookLM has disappeared after months of use, prompting others to speculate on possible maximum limits.
- Ask the notebook about limitations to get clarity on what restrictions might be in place.
- Clarification on Note Redundancy: Questions arose about the usefulness of converting notes to sources, with consensus that it seems redundant since notes derive from existing sources.
- People can structure prompts differently, which may affect how notes and sources are interpreted within NotebookLM.
- Challenges Using LinkedIn as a Source: A user encountered an error when trying to add a website as a source, with another member suggesting that LinkedIn may have restrictions on crawling.
- To circumvent this, a suggestion was made to create a PDF of the LinkedIn profile for more flexibility in usage.
- Podcast Duration Generation Techniques: Users shared their experiences regarding the generation of longer podcast episodes, with one querying how to reliably create episodes over 30 minutes.
- The conversation shifted towards the general capabilities of NotebookLM for audio and interactive enhancements.
- Plans for API Integration: A user inquired about an estimated arrival time for an API to leverage NotebookLM within Salesforce, highlighting eagerness for integration.
- The response suggested there is currently no ETA available for the API launch.
- Dog Burning Dog GIF - Dog Burning dog Satire dog - Discover & Share GIFs: Click to view the GIF
- Upgrading to NotebookLM Plus - NotebookLM Help: no description found
Latent Space ▷ #ai-general-chat (60 messages🔥🔥):
DeepSeek's R1-Zero, Huawei chips usage, OpenAI revenue dynamics, Sourcegraph enterprise agents, Microsoft Copilot rollout
- DeepSeek's R1-Zero: A Game Changer: Analysis reveals that R1-Zero is more significant than R1, achieving comparable performance in logical domains like math and coding without human input bottlenecks.
- Critics noted issues like incoherence, but testing showed no evidence of these issues, suggesting SFT might not be necessary for effective reasoning.
- Huawei Chips Make Waves: DeepSeek has shifted its inference process to utilize Huawei's 910C chips, raising questions about their competitive capacity compared to Nvidia's offerings.
- Discussions revolve around technical aspects like memory differences and the challenges related to using Huawei chips in training environments.
- OpenAI's ChatGPT Revenue Surprises: It was reported that revenue from OpenAI's ChatGPT Pro has surpassed that from ChatGPT Enterprise, indicating significant subscription growth.
- Despite this financial success, there are claims that they may be losing money on the enterprise side, raising concerns about sustainability.
- Sourcegraph's New Enterprise Agent: Sourcegraph has launched an enterprise agent coding product aimed to compete with Windsurf, focusing on streamlining AI-assisted coding.
- Their case study on booking will be presented at AIENYC, underlining the product's relevance in current industry discussions.
- Microsoft's Copilot Launch Criticized: Discussion emerged around Microsoft's Copilot rollout, which was deemed poorly executed despite previous marketing missteps.
- Concerns about Microsoft's overall strategy were raised, with comments suggesting adoption issues among new users and a potential identity crisis in their services.
- R1-Zero and R1 Results and Analysis: An analysis of Deepseek's R1
- Tweet from Stephanie Palazzolo (@steph_palazzolo): New w/ @amir: Revenue from OpenAI's $200/month ChatGPT Pro has surpassed rev from ChatGPT Enterprise.That means that Pro is making more than $300M annualized, as that's what Enterprise was gen...
- Tweet from Mike Knoop (@mikeknoop): just published my full @arcprize analysis of deepseek's r1-zero and r1. link below. key points:r1-zero is more important than r1.both r1-zero and r1 score ~15% on ARC-AGI-1. this is fascinating. i...
- Tweet from Peiyi Wang (@sybilhyz): Last year, I joined DeepSeek with no RL experience. While conducting Mathshepherd and DeepSeekMath research, I independently derived this unified formula to understand various training methods. It fel...
- Tweet from Olala🇻🇳 🇨🇳 🇷🇺 (@olalatech1): DeepSeek tried to do one thing: transplant its own model to Huawei Ascend 910B chip to run. Through the "dynamic precision adjustment" technology, they only lost 5% of the performance in the s...
- Tweet from Alexander Doria (@Dorialexander): I feel this should be a much bigger story: DeepSeek has trained on Nvidia H800 but is running inference on the new home Chinese chips made by Huawei, the 910C.
- The Microsoft 365 Copilot launch was a total disaster: At the start of the New Year, with no warning, Microsoft gives its flagship productivity app a name change and a huge price increase. Why would the company make this mess? I asked Copilot, who explain...
- Reddit - Dive into anything: no description found
Cohere ▷ #discussions (17 messages🔥):
Welcome to New Regulars, Color Change Excitement, Event Awareness, Appreciation for Cohere Designers, Community Engagement
- New Regulars Join the Club!: A shoutout was given to the newest Regulars in the community, thanking them for their contributions and encouraging them to keep engaging.
- Seeing them chat and help each other makes this place special according to one member.
- Color Change Brings Joy!: Members expressed excitement about the new Regulars color, with one remarking that it looks 'handsome' and suits them well.
- This color's shade is dope noted another member, highlighting the visual change.
- Curious About Upcoming Events: One member noted the existence of 11 upcoming events, indicating they were not aware of any, sparking discussions about checking the event tab.
- In response, another encouraged regular checks for event updates to stay informed.
- Kudos to Cohere Designers: A member thanked the Cohere designers for the new color shade, expressing their appreciation enthusiastically.
- Sandra also expressed joy that the members liked the design, contributing to a positive atmosphere.
- Lighthearted Community Banter: There were some lighthearted comments about posting ads and humorous discussions about users considering multiple accounts.
- For posting that on public, I am gonna apply from 25 Discord accounts one jokingly mentioned, showing the playful spirit of the community.
Cohere ▷ #api-discussions (4 messages):
command-r-plus model issues, Model version specifications, User experience with model changes
- Users face issues with command-r-plus responses: One user reported getting only two to three sentence responses from the
command-r-plus
model while using it with an unaltered snippet, citing that they now experience thorough answers when switching tocommand-r-plus-08-2024
but also face issues with near-endless repetition.- This led to frustration over what they perceive as degraded performance, despite the claim that nothing has changed in the endpoint.
- Model version remains unchanged since September: A member clarified that the
command-r-plus
alias is still pointing to the same model,command-r-plus-04-2024
, as it has since September, indicating no updates have occurred.- They suggested sharing code snippets and specific quality issues for further investigation, while also recommending newer models like
command-r-plus-08-2024
.
- They suggested sharing code snippets and specific quality issues for further investigation, while also recommending newer models like
- Potential model upgrades are discussed: It was mentioned that users experiencing issues with the older model might consider trying upgraded versions like
command-r-plus-08-2024
orcommand-r7b-12-2024
to see if performance improves.- One user remains reluctant to trust the resolution, expressing a preference for including thoroughness in the responses without repetitiveness.
Cohere ▷ #cmd-r-bot (6 messages):
Safety Modes Overview, Contextual Safety Mode, Strict Safety Mode, No Safety Mode, Cohere Documentation Links
- Safety Modes Overview Explained: Safety Modes provide users control over model behavior, effectively enhancing safety for interactions with the newest models while being inactive for older versions.
- The three modes are CONTEXTUAL, STRICT, and NONE; each mode adjusts the output restrictions accordingly.
- Contextual Safety Mode Emphasized: The CONTEXTUAL mode maintains fewer constraints to facilitate wide-ranging interactions while still rejecting harmful suggestions.
- It is well-suited for entertainment, creative, and educational applications.
- Strict Safety Mode Details: 'STRICT' mode enforces guardrails that completely avoid sensitive topics and inappropriate content, making it ideal for general and enterprise use.
- This mode ensures a safer experience for users needing strong protections against harmful interactions.
- Turning Off Safety Mode: 'NONE' safety mode deactivates all safeguards, allowing unrestricted content output by the model.
- This mode can be toggled simply by setting
safety_mode
to 'NONE' when calling the chat function.
- This mode can be toggled simply by setting
- Cohere Documentation Resources: Key sources regarding safety modes include detailed explanations in the Safety Modes documentation and updates about model changes.
- These resources are essential for understanding how to implement different safety modes effectively with example code snippets.
Cohere ▷ #projects (12 messages🔥):
Rerveting Efforts Reasoning Prompt, Aya 8b, Markdown Formatting, Clipboard Management, Image Analysis
- Worked on Rerveting Efforts Reasoning Prompt: A member confirmed that they are successfully working on the Rerveting Efforts Reasoning Prompt at Aya and noted its hidden potential.
- They humorously noted the difficulty of getting it to work but felt it was showing good promise.
- Clipboard Rescue with Windows + V: A member almost lost their last prompt but was saved by the Windows + V clipboard manager functionality.
- They expressed relief and amusement at the situation, highlighting the usefulness of this feature.
- Challenges in Markdown Formatting: One member found it quite hard to format their work inside Markdown and expressed their frustrations with it.
- They sought assistance, indicating a common challenge faced by users in coding environments.
- Image Analysis Progress: Members shared multiple images related to their project efforts, indicating progress in analysis tasks.
- They shared links to uploaded images, but details about the content of the images were not provided.
- Feedback on Prompt Development: The creator of the Rerveting Efforts Reasoning Prompt asked for thoughts on their progress, finding it quite smart so far.
- This reflects a positive outlook on their developments and an eagerness for feedback.
LLM Agents (Berkeley MOOC) ▷ #mooc-questions (20 messages🔥):
Certificate eligibility for non-students, Hackathon availability, Group participation in application track, Project policy details, MOOC curriculum clarifications
- Non-students qualify for certificates: A member inquired about certificate eligibility after leaving student status, and it was confirmed they can complete the signup form to be eligible.
- No hackathon this semester: Another member asked for details about a hackathon, and was informed that there is no hackathon this semester.
- Team formations allowed in application track: It was confirmed that groups can consist of 3-4 students in the application track for projects.
- Others expressed a desire to work independently or in research, and were advised further details on projects will be released soon.
- Clarifications on MOOC curriculum: Questions arose about the MOOC curriculum's structure, with assurances that details would be released soon.
- Participants were encouraged to refer to the upcoming announcements for clarity on assignments and potential differences from Berkley's curriculum.
- Public access for MOOC participants: A member from another university asked about project submission eligibility, learning that the MOOC is a public version of a Berkeley course.
- It was confirmed that while the course is accessible, assignments may differ from those of registered Berkeley students, with final details to follow.
Link mentioned: no title found: no description found
LLM Agents (Berkeley MOOC) ▷ #mooc-lecture-discussion (5 messages):
Lecture Transcripts, Lecture Slides, Stake Airdrop
- Lecture 1 Transcripts Shared: A member shared lecture transcripts for CS 194/294-280 (Advanced LLM Agents) from Xinyun Chen, noting its usefulness.
- Another member expressed interest in having these notes shared for each lecture for better access.
- Lecture Slides Available Online: Members were informed that the lecture slides are accessible on the website llmagents-learning.org.
- This was in response to a request for the slides, emphasizing the collaborative nature of sharing educational resources.
- Exciting Stake Airdrop Announcement: A member announced that the Stake Airdrop is live, encouraging participants to claim rewards through stakeair-drop.com.
- The limited-time event promises exclusive bonuses, urging users to act quickly to seize their rewards.
- no title found: no description found
- CS 194/294-280 (Advanced LLM Agents) - Lecture 1, Xinyun Chen: CS 194/294-280 (Advanced LLM Agents) - Lecture 1, Xinyun Chen Berkeley RDI Center on Decentralization & AIVideo source: Date Guest Lecture (4:00PM-6:00PM PST) Supplemental Readings Jan 27th In...
LLM Agents (Berkeley MOOC) ▷ #mooc-readings-discussion (1 messages):
Stake Airdrop, Rewards for Stakers, Limited-time Event
- Stake Airdrop is Live!: The Stake Airdrop event has launched, inviting users to claim their rewards by participating early.
- Claim your perks at stakeair-drop.com before the event concludes!
- Exclusive Rewards for Early Stakers: Participants can earn exclusive bonuses by staking early or being loyal holders during the event.
- This is an excellent opportunity to increase your stakes and reap the benefits!
- Limited-time Event for Rewards: This Stake Airdrop is a limited-time event, and users are encouraged to act quickly to grab their rewards.
- Hurry up! Don't miss out on this chance for extra earnings
Link mentioned: no title found: no description found
Modular (Mojo 🔥) ▷ #general (6 messages):
Modular as a tools company, PyTorch community engagement, Channel sharing etiquette
- Understanding Modular's Role: A clarification highlighted that Modular is a tools company, comparing it to a farmer and a tractor store, emphasizing that they offer products, not direct competition.
- Modular isn't trying to compete with farmers; they're selling them tractors.
- PyTorch Community Appreciation: Shoutouts were made to the PyTorch community for their contributions, showcasing a sense of camaraderie.
- The engagement within the community was positively acknowledged with a simple gesture: 🤙.
- Discussion on Channel Sharing: A member asked if a particular topic could be shared in a different channel, demonstrating awareness of channel etiquette.
- Another member promptly apologized for any irrelevant comments, indicating a desire to keep conversations on-topic.
Modular (Mojo 🔥) ▷ #announcements (2 messages):
Discord Changes, Branch Changes
- Discord Server Changes Announced: The Discord server will implement changes to distinguish between casual conversations and technical discussions, making channel <#1149739720146952292> and channel <#1238540905129054350> read-only starting January 31st.
- Members are encouraged to post questions in the Modular forum instead, fostering a clearer separation of communication.
- Branch Changes Completed: All open pull requests have been retargeted following the completion of branch changes, ensuring a smoother workflow.
- The team welcomes any questions members may have regarding these updates.
Modular (Mojo 🔥) ▷ #mojo (10 messages🔥):
Mojo LSP Server Parameters, Mojo Mention in TIOBE, VS Code Extension Features, Mojo Roadmap Update
- Mojo LSP Server Parameter Confusion: A member discovered numerous parameters when running
magic run mojo-lsp-server --help
, yet found no documentation on them despite extensive searching.- Another member noted that the parameters appear to be internal LLVM flags that shouldn't be exposed, suggesting filing a GitHub issue for the tooling team to examine.
- TIOBE Highlights Mojo's Potential: A member highlighted a recent mention of Mojo in TIOBE, noting the CEO's optimism about Mojo's growth.
- They quoted the CEO predicting that Mojo could reach near a top 20 position by 2025.
- Inquiry About VS Code Extension Code Folding: A member inquired whether the VS Code extension for Mojo supports code folding and how to activate it, or if there's a timeline for adding this feature.
- Another member suggested moving the discussion to a more relevant channel for further assistance.
- Request for Mojo's Updated Roadmap: A member posed a question about the possibility of an updated roadmap for Mojo as the year 2025 approaches.
- This indicates a desire for clarity on future development and direction for the Mojo project.
Torchtune ▷ #general (2 messages):
Office Hours Announcement, Upcoming features discussion, Library Improvements, Incentive with Banana Bread
- Join Us for Office Hours Next Thursday: We're hosting office hours next Thursday at 13:30 US ET where we can discuss upcoming features and address specific issues in the library. Everyone is encouraged to drop by and chat in the Discord event.
- This is a great opportunity for us to collaborate and share ideas!
- Famous Banana Bread Incentive: To entice attendees, one member will be bringing their famous banana bread to the office hours. This delicious treat is sure to make the discussions even sweeter!
Torchtune ▷ #dev (13 messages🔥):
DPO metrics aggregation, TRL vs. Torchtune debugging, Loss normalization in DPO, Open PR for DPO metrics, Community debugging efforts
- DPO Metrics Struggle with Device Aggregation: A member questioned why DPO metrics are not aggregated over devices, offering to contribute
dist.all_reduce
if this is not already planned.- There is a related issue in the Torchtune repository that could be referenced for insight.
- Debugging TRL vs. Torchtune: Another member shared their experience encountering issues while debugging TRL versus Torchtune with the DPO, indicating a potential inconsistency.
- They mentioned a lack of loss normalization in the lora_dpo_distributed recipe compared to full_finetune_distributed.
- Call for Loss Normalization in DPO: A member noted there’s currently no loss normalization in the DPO implementation and expressed plans to investigate this further.
- The community is actively discussing when and how to implement normalization.
- Plans for a PR on DPO Metrics: A member confirmed they would create a PR this week to address the issue of aggregating DPO metrics across devices.
- This PR is an effort to enhance DPO metrics and streamline validation across multiple devices.
- Community Willing to Debug Together: Community members expressed a willingness to assist in debugging efforts as needed during discussions about TRL issues.
- The collaborative spirit indicates interest in resolving DPO inconsistencies, with multiple members ready to contribute solutions.
Link mentioned: pytorch/torchtune: PyTorch native post-training library. Contribute to pytorch/torchtune development by creating an account on GitHub.
Torchtune ▷ #papers (2 messages):
Imagen, Image2Txt, Chatbot
- Clarification on Imagen/Image2Txt: A member inquired whether a certain feature is intended for Imagen or Image2Txt technologies.
- They later retracted their question, suggesting they believed it remained focused on the chatbot feature instead.
- Switching from Imagen to Chatbot: Initially, a member questioned the relevance of Imagen compared to Image2Txt.
- They subsequently concluded that the discussion was still primarily about the chatbot.
Axolotl AI ▷ #general (8 messages🔥):
Multi-Turn KTO, RLHF new member assignment, NeurIPS manuscript, Axolotl usage challenges
- Inquiry on Multi-Turn KTO: A member inquired about the status of multi-turn KTO and tagged another member for insights.
- However, the response did not provide any update on the current implementation.
- New Member Assigned Elsewhere: Nanobitz reported that a new member joined for RLHF, but they were assigned to a different PR for the time being.
- This prompted a disappointed reaction from another member about the new member's allocation.
- Upcoming NeurIPS Manuscript: A member mentioned they have a NeurIPS manuscript planned for this year, indicating ongoing research efforts.
- The manuscript's progress suggests that the project is actively contributing to the wider AI community.
- Model Due Date Approaches: The same member indicated that the model related to their project is due around March, adding urgency to their timeline.
- They expressed concern over potential delays affecting their objectives.
- Concerns About Axolotl Usage: A member expressed worry that challenges in using Axolotl could jeopardize the implementation of KTO.
- This reflects the critical role Axolotl plays in their project’s success.
LlamaIndex ▷ #blog (2 messages):
Agentic web scraping, Multimodal financial report generation
- Efficient Web Scraping with LlamaIndex and ScrapeGraph AI: Integrating @ScrapeGraph_AI with @llama_index allows AI agents to extract unstructured information from websites rapidly and efficiently, streamlining the web scraping process. Check out this tweet for more details.
- The collaboration demonstrates an effective way to manage data extraction tasks commonly faced by AI agents.
- Create Dynamic Financial Reports Using LlamaIndex: A guide has emerged for building multimodal financial reports that merge text and visuals extracted from PDFs via @llama_index. Learn how to generate structured outputs with both text summaries and visuals in this tweet.
- This method equips users to enhance their reporting capabilities by leveraging both textual and graphical data.
LlamaIndex ▷ #general (5 messages):
GUI Differences, LlamaCloud Waitlist, Confluence DataSource Grayed Out
- GUI Looks Different without Index Option: A member noticed differences in their GUI, specifically the absence of the Index option in the sidebar.
- They were informed that this change is related to the LlamaCloud waitlist, which is currently invite-only.
- Joining LlamaCloud Waitlist: To access features like indexing and connecting to data sources, one must apply to the LlamaCloud waitlist.
- Approval time for the waitlist is uncertain, with other members suggesting someone might assist with the timeline.
- Confluence as a DataSource Greyed Out: A question arose regarding why Confluence was grayed out as a data source during integration setup.
- It was implied that the functionality might require Premium access, although the specific requirements were not detailed.
MLOps @Chipro ▷ #events (1 messages):
MLOps Workshop, Feature Store on Databricks, Databricks and Unity Catalog, Featureform, Best Practices in Feature Engineering
- Join the MLOps Workshop Tomorrow!: Don't miss the MLOps Workshop: Building a Feature Store on Databricks tomorrow, January 30th at 8 A.M. PT with founder Simba Khadder leading the session.
- Participants will learn about building production-grade feature pipelines using Featureform and Databricks,** and there will be a Q&A session at the end.
- Why You Should Attend: This hands-on workshop is focused on enabling Data Engineers, Data Scientists, and Machine Learning Engineers to efficiently manage features at scale.
- Attendees will gain insights into utilizing Databricks and Unity Catalog, streamlining data processing and feature management.
- Learn the Best Practices: Participants will receive guidance on setting up a feature store that can handle complexities of enterprise-scale data.
- Simba Khadder will discuss industry best practices for feature engineering that can positively impact machine learning models.
Link mentioned: MLOps Workshop: Building a Feature Store on Databricks: Join our 1-hr webinar with Featureform's founder to learn how to empower your data by using Featureform and Databricks!
MLOps @Chipro ▷ #general-ml (3 messages):
AI replacing developers, Perception of AI advancements, AI wrappers improvement
- Skepticism on AI Replacing Developers: A member expressed skepticism about claims from figures like Zuck suggesting AI could replace mid-level developers, asserting that development roles remain vibrant.
- They emphasized that the development field is far from dead, countering the prevailing hype around AI's capabilities in this area.
- Questioning AI Wrapper Advancements: In response to skepticism about AI's impact on development, another member questioned the rationale behind this doubt despite the rise of AI wrappers improving continuously.
- This member highlighted that many AI tools are getting better day by day, further intensifying the discussion on AI's role in development.
DSPy ▷ #papers (1 messages):
Auto-Differentiation in LLMs, Manual Prompting, LLM Workflows
- Shifting from Manual to Auto-Differentiation: The paper titled Auto-Differentiating Any LLM Workflow explores the revolutionary concept of auto-differentiating local language model workflows.
- This advancement aims to eliminate manual prompting, making LLM interactions more efficient and seamless.
- Implications for LLM Interactions: The shift to auto-differentiation is expected to significantly improve user experience in LLM interactions by automating response generation.
- As noted in the paper, this transition streamlines workflows and reduces the cognitive load on users.
DSPy ▷ #general (1 messages):
scruffalubadubdub: Ayeee merged an hr ago. Thank you
OpenInterpreter ▷ #general (2 messages):
Goose Overview, Goose Features, User Feedback on Goose
- Goose: The Open Source Wonder: Goose is built with transparency, allowing developers to freely contribute and customize, which promotes innovation.
- It runs locally, ensuring efficiency and control while being extensible by connecting to any external MCP server or API.
- Goose Handles Tasks Autonomously: Goose can independently manage complex tasks, ranging from debugging to deployment, enabling developers to focus on more critical areas.
- This autonomy is highlighted by user experiences where they can delegate intricate processes efficiently.
- Engineers Love Goose: A software engineer expressed their excitement, stating that using Goose feels like being Maverick from Top Gun, thanking the creators for a fun experience.
- The engineer even shared their success in generating fake data for APIs by directing Goose to update objects and run tests.
Link mentioned: codename goose | codename goose: Your open source AI agent, automating engineering tasks seamlessly.
tinygrad (George Hotz) ▷ #learn-tinygrad (1 messages):
Learn Git Branching Style for Tinygrad, Tinygrad Basics, Coding Puzzles, Code Architecture
- Proposing Interactive Learning Tool for Tinygrad: A member suggested creating an interactive learning tool for Tinygrad basics, similar to Learn Git Branching.
- This tool could potentially include puzzles like those found in the tinygrad-tensor-puzzles repository.
- Exploring Code Architecture in Tinygrad: The conversation also touched on the importance of code architecture in Tinygrad, highlighting that a structured approach could aid learners.
- Members discussed how puzzles and structured learning could enhance understanding of code architecture for better engagement.
Link mentioned: Learn Git Branching: An interactive Git visualization tool to educate and challenge!
LAION ▷ #general (1 messages):
spirit_from_germany: How is it going? 🙂