AI News (MOVED TO news.smol.ai!)

Archives
December 2, 2024

[AINews] not much happened today

This is AI News! an MVP of a service that goes thru all AI discords/Twitters/reddits and summarizes what people are talking about, so that you can keep up without the fatigue. Signing up here opts you in to the real thing when we launch it 🔜


a quiet day is all you need.

AI News for 11/29/2024-12/2/2024. We checked 7 subreddits, 433 Twitters and 29 Discords (198 channels, and 4766 messages) for you. Estimated reading time saved (at 200wpm): 563 minutes. You can now tag @smol_ai for AINews discussions!

Nothing big but lots of little notables:

  • Lilian Weng released a Reward Hacking survey
  • Pydantic launched their agent framework
  • Supabase launched v2 of their assistant
  • ChatGPT cannot say David Mayer

and teases (no product release):

  • Browser Company teased their second browser
  • World Labs launched image-to-3d-world
  • The NotebookLM team left Google
  • Cognition was on the cover of Forbes

The Table of Contents and Channel Summaries have been moved to the web version of this email: !


AI Twitter Recap

all recaps done by Claude 3.5 Sonnet, best of 4 runs.

Theme 1. Language and Video Models: Innovations and Optimization

  • Nvidia Puzzle: Distillation-Based NAS for LLMs: @_akhaliq shared Nvidia's presentation on Puzzle, a distillation-based neural architecture search for inference-optimized Large Language Models. This approach aims to improve efficiency and performance in model deployment.
    • Discussion on effectiveness and application emphasized by the community showcases excitement around this optimization technique.
  • IC-Light V2 Model Release: @_akhaliq discussed alternative models of IC-Light V2 designed for varied illumination scenarios, along with a demo showcasing its potential applications.
  • Trajectory Attention and Timestep Embedding for Video Models: @_akhaliq introduced Trajectory Attention for fine-grained video motion control, alongside Timestep Embedding as a caching mechanism for video diffusion models. These techniques offer advancements in video motion precision and efficiency.

Theme 2. AI Outreach and Collaborations

  • Amazon and Anthropic Partnership: @DeepLearningAI reported Amazon's increased investment, bringing their total commitment to Anthropic to $8 billion—a significant boost for the startup's growth and AI capabilities.
  • AI Fellowship and Safety Research: @AnthropicAI is starting a fellowship program, planning to provide funding and mentorship to engineers and researchers to transition into AI safety research. Fellows will collaborate with established researchers on projects addressing adversarial robustness, scalable oversight, and more.
  • Google's Expansion in AI: @osanseviero announced joining Google to work on the Gemini API, open models, and collaboration spaces like Colab and AI Studio, indicative of Google's push for broader AI integration.

Theme 3. Domain Names and Online Identity

  • Debating .com Dominance: @adcock_brett argues against the necessity of .com domains for credibility, advocating instead for investing in product and branding over securing premium domain names.
    • Further discussions (tweets, tweets) emphasize the relevancy and impact of alternative domain extensions like .io, .ai, and .co for tech and startup environments.

Theme 4. Advances in Reasoning and AI Agents

  • Reverse Thinking in LLMs Strengthens Reasoning: @iScienceLuvr shared insights on "Reverse Thinking" in Language Models, improving performance by training LLMs to start from solutions and reason backwards, demonstrating a 13.53% improvement over standard methods.
  • New Agent Frameworks with Pydantic: @omarsar0 announced the launch of a PydanticAI agent framework, emphasizing a type-safe, model-agnostic approach for building production-grade applications with structured response validation and support for streamed responses.

Theme 5. Machine Learning Humor and Light-Hearted Engagements

  • Creative Strategies in AI: @goodside humorously strategizes about assignments that complicate the use of ChatGPT, notably mentioning name "David Mayer" as a potential keyword to perplex AI users.
    • Memes like "Giving homework as images" explore playful engagements with students.
  • Refreshing Perspectives on AI Practices: @swyx encourages creative and expressive prose in AI-driven content, advocating against a monotonous style and emphasizing variability and human elements in written communication.
  • Exploring AI's Impact on Culture and Engagement: @karpathy often shares insights into how AI influences and transforms cultural engagements, adding joy and humor to discussions around AI and its societal impact.

AI Reddit Recap

/r/LocalLlama Recap

Theme 1. Chinese Models Dominate: QwQ-32B & DeepSeek Outperform GPT-4

  • QwQ vs o1, etc - illustration (Score: 117, Comments: 68): A visual comparison shows performance metrics between QwQ and other models across four technical benchmarks: GPQA, AIME, MATH-500, and LiveCodeBench, with a reference to an earlier comparison between Qwen 2.5 vs Llama 3.1. The benchmarks evaluate graduate-level scientific knowledge (GPQA with 34% baseline accuracy for non-experts and 65% for PhD experts), advanced mathematical problem-solving (AIME), comprehensive mathematics (MATH-500), and real-time coding abilities (LiveCodeBench).
    • QwQ 32B 8bit demonstrated exceptional reasoning capabilities by correctly solving all prompts from the "GPT-4 can't reason" paper, with extensive internal dialogue taking up to 30 minutes per problem like the Wason Selection Task.
    • Users discovered that Ollama's default 2k context size can be limiting for QwQ's reasoning tokens, with recommendations to use Exllamav2 or Koboldcpp for better performance and VRAM utilization. The model can be paired with Qwen2.5-coder-0.5B or 2.5-0.5-Instruct as draft models for speculative decoding.
    • The model exhibits multilingual reasoning capabilities, switching between English, Chinese, Russian, and Arabic during its chain of thought process. As noted by Karpathy, this behavior suggests proper RL implementation.
  • Open-weights AI models are BAD says OpenAI CEO Sam Altman. Because DeepSeek and Qwen 2.5? did what OpenAi supposed to do! (Score: 502, Comments: 205): DeepSeek and Qwen 2.5 open-source AI models from China have demonstrated capabilities that rival OpenAI's closed models, leading to public discourse about model accessibility. In response, Sam Altman expressed concerns about open-weights models in an interview with Shannon Bream, emphasizing the strategic importance of maintaining US leadership in AI development over China.
    • OpenAI's perceived stagnation and reliance on scaling/compute power is being criticized, with users noting their $157 billion valuation seems unjustified given emerging competition. The company appears to be losing their competitive advantage or "moat" as open-source models catch up.
    • Users point out the irony of Sam Altman's previous safety concerns about open-weights models, as better open-source alternatives have emerged without causing predicted harm. Multiple comments referenced his earlier emails with Elon Musk promising openness, contrasting with current stance.
    • Technical discussion highlights that while OpenAI's Advanced Voice Mode remains unique, competing solutions are emerging through combinations of Whisper, LLM, and TTS technologies. Users debate whether OpenAI's lead is due to genuine innovation or primarily marketing and compute resources.

Theme 2. JPEG Compression for LLM Weights: Novel Research Direction

  • Thoughts? JPEG compress your LLM weights (Score: 142, Comments: 64): JPEG compression techniques could be applied to Large Language Model weight storage, though no specific implementation details or results were provided in this post. The proposal draws parallels between image compression and neural network parameter compression, suggesting potential storage optimization methods.
    • Community skepticism focused on the impracticality of matrix reordering, with experts explaining that reordering both rows and columns would break matrix multiplication properties. Multiple users pointed out that neural network weights behave more like random noise than structured image data.
    • Technical discussions revealed that attempts to implement similar compression techniques yielded minimal results, with one user reporting only a "few percentage points reduction" in weight spread using simulated annealing. A user shared experience converting tensors to 16-bit grayscale PNG files, which worked losslessly but failed with JPEG compression.
    • Several experts recommended sticking with established quantization methods like AWQ or GPTQ instead, noting that LLM weights lack the spatial patterns that make JPEG compression effective. Discussion highlighted that weights don't follow regular statistical distributions that could be exploited by traditional compression algorithms.

Theme 3. Qwen 2.5 Powers Hugging Face's Text-to-SQL Feature

  • Hugging Face added Text to SQL on all 250K+ Public Datasets - powered by Qwen 2.5 Coder 32B 🔥 (Score: 98, Comments: 11): Hugging Face integrated Text-to-SQL capabilities across their 250,000+ public datasets, implementing Qwen 2.5 Coder 32B as the underlying model. The feature enables direct natural language queries to be converted into SQL statements for database interactions.
    • Hugging Face team member confirms the feature uses DuckDB WASM for in-browser SQL query execution alongside Qwen 2.5 32B Coder for query generation, and welcomes user feedback for improvements.
    • Users express enthusiasm about the tool's potential to help those less experienced with SQL, with one noting it addresses a significant pain point in dataset interaction.
    • The announcement generated playful responses about the included confetti animation and the potential to rely less on direct SQL knowledge.

Theme 4. Fox News Targets Open Source AI as National Security Threat

  • Open-Source AI = National Security: The Cry for Regulation Intensifies (Score: 101, Comments: 70): Fox News aired a segment claiming open-source AI models pose risks to US national security, though no specific details or evidence were provided in the coverage. The narrative adds to growing media discussions about potential regulation of open-source AI development, though without substantive technical analysis.
    • Chinese AI models like Deepseek R1 and Qwen are reportedly ahead of US open-source models like Meta's Llama. Multiple users point out that China's top models are not based on Llama, contradicting the narrative about open-source helping Chinese development.
    • Users criticize the push for regulation as an attempt to enforce AI monopolies and corporate control. The community suggests that restricting US open-source development would effectively hand the entire open model sector to China, who is already releasing top-tier open models.
    • The discussion emphasizes that open-source technology has historically proven more secure than closed-source alternatives over the past 40 years. Users argue that preventing open development would harm innovation and collaboration while benefiting large tech companies like Microsoft, OpenAI, and Anthropic.

Other AI Subreddit Recap

r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity

Theme 1. StreamDiffusion Powers Live AI Visuals in Concert Performances

  • Bring Me The Horizon using real time img2img? (Score: 337, Comments: 62): Bring Me The Horizon concert featured real-time img2img AI visual effects during their live performance. The post inquires about the technical workflow enabling real-time AI image generation and transformation during a live concert setting.
    • StreamDiffusion appears to be the leading solution for real-time AI visual effects, achieving up to 90 FPS on an RTX 4090. A demonstration package created by user tebjan for vvvv showcases implementations with examples available on Instagram and Google Photos.
    • The visual consistency is maintained through a clever technique where the video feed is larger than the displayed crop, allowing objects to remain in the generation frame even when they appear to leave the visible screen. Multiple users reported seeing similar effects at Download Festival with Avenged Sevenfold.
    • Community reception is mixed, with significant criticism of the temporal consistency issues and overall aesthetic quality. A technical malfunction at Download Festival highlighted limitations when A7X's show lost power but the AI effects continued running without context.

Theme 2. Haiku vs ChatGPT: Free Tier Comparison Shows ChatGPT Lead

  • Haiku is terrible. (Score: 233, Comments: 114): A user expresses disappointment with Claude Haiku, finding it significantly inferior to ChatGPT's free tier despite attempts to continue using it, ultimately returning to ChatGPT after previously using Claude/Sonnet. The user, residing in a third world country, cites prohibitive subscription costs as the main barrier to accessing premium AI models like Sonnet, hoping for future accessibility of these models.
    • Regional pricing is a significant issue for Claude accessibility, with users noting that in countries like Venezuela, the subscription cost equals 2 months of minimum wage income. Some users suggest workarounds like creating multiple Google accounts for Poe or using Google AI Studio which offers 1 million tokens per minute free tier.
    • Users report that Haiku performs poorly compared to both ChatGPT's free tier and local models like Llama or Qwen. ChatGPT is currently considered the best value in both free and paid tiers, though some suggest DeepSeek (with 50 daily uses) as an alternative.
    • Sonnet's recent limitations (50 messages per week) have frustrated users, with many reporting needing to significantly reduce project file sizes and refine prompts. Some users attribute this to Anthropic's pivot to B2B focus, especially after the Amazon acquisition.

Theme 3. World Labs' $230M AI Startup Launches 3D Scene Generation

  • First demo from World Labs - $230m Startup Led by Fei Fei Li. Step inside images and interact with them! (Score: 209, Comments: 43): World Labs, led by Fei Fei Li, introduced a system for converting images into interactive 3D scenes. The startup, which raised $230 million in funding, enables users to step inside and interact with generated 3D environments from 2D images.
    • Technical analysis reveals the system likely uses Gaussian splats for rendering, evidenced by translucent ovals in vegetation and references in their threeviewer_worker.js file. The technology appears to be 2.5D with limited movement to avoid artifacts.
    • The project can be accessed via WorldLabs.ai, with a realtime renderer for modern devices and a fallback version with pre-rendered videos for older mobile devices. Scene generation likely takes 5+ minutes with realtime rendering afterward.
    • Discussion around the $230 million funding sparked debate about investment value, with some defending it as frontier tech development while others questioned the cost for what they view as advanced HDRI generation. Several users noted potential VR applications and metaverse implications.

Theme 4. AI Surpassing Human Benchmarks Sparks Testing Debate

  • AI has rapidly surpassed humans at most benchmarks and new tests are needed to find remaining human advantages (Score: 281, Comments: 146): AI systems have outperformed human baselines across most standard evaluation benchmarks, making it difficult to accurately measure remaining areas of human cognitive advantage. The rapid pace of AI benchmark saturation suggests a need for developing new types of tests that can better identify and quantify uniquely human capabilities.
    • LLMs show limitations in complex code synthesis tasks and the ARC Challenge, with users noting that AI performance on benchmarks like SAT questions may be influenced by training on existing test data rather than true comprehension.
    • Users highlight real-world performance gaps, sharing examples where prompt engineering took significantly longer than manual work, with one user describing a case where their boss spent 2 days attempting what they completed in 30 minutes.
    • Discussion emphasizes societal implications, with concerns about job displacement in the next 2-3 years and the need for workers to develop "Plan B" career strategies, while others point out that tools like Wolfram Alpha haven't replaced specialized professions despite superior mathematical capabilities.

AI Discord Recap

A summary of Summaries of Summaries by O1-preview

Theme 1. Pushing the Limits: New AI Training and Optimization Breakthroughs

  • Nous DisTrO Takes Decentralized Training by Storm: Nous Research kicked off decentralized pre-training of a 15B language model using DisTrO, leveraging hardware from partners like Oracle and Lambda Labs. They matched centralized training metrics, with their DeMo optimizer reducing inter-accelerator communication.
  • Homemade CUDA Kernel Beats cuBLAS on H100: A custom H100 CUDA matmul kernel outperformed cuBLAS by 7% for N=4096, showcasing that sometimes rolling your own code pays off.
  • FP8 Training Gets Easier: No More Dynamic Scaling!: A new method enables out-of-the-box FP8 training without dynamic scaling, using the unit-scaling library. Low-precision training just got simpler.

Theme 2. AI Tools Get Smarter: Updates You Can't Miss

  • Aider v0.66.0 Writes Most of Its Own Code!: The latest Aider release adds PDF support for Sonnet and Gemini models and introduces AI-triggered code edits with AI! comments. Impressively, 82% of the code was written by Aider itself.
  • Cursor IDE Update Ruffles Feathers, But Agent Feature Shines: Cursor removed the long context option, frustrating users. However, the new agent feature is being praised as a "senior developer" assistant, making coding smoother, especially on smaller projects.
  • OpenRouter Lets Users Steer Development with Feature Voting: OpenRouter launched a Feature Requests Voting system, inviting users to vote on new features and drive community-driven development.

Theme 3. Stumbling Blocks in AI Model Integration and Training

  • Fine-Tuning Qwen 2.5? Don't Forget the Special Sauce!: Users emphasized the need to use Qwen's specific ChatML template for fine-tuning Qwen 2.5, cautioning against default options to avoid hiccups.
  • Stable Diffusion vs. Lora Models: The Integration Headache: Despite following all the steps, users struggled to get Lora models working in Stable Diffusion, pointing to possible bugs or overlooked steps in the integration process.
  • CUDA Errors Cramping Your Style? Try Quantization Magic: Users facing CUDA errors and VRAM limitations when loading large models suggested switching to smaller quantization formats or alternative cloud providers with better GPU support.

Theme 4. AI Model Performance: Comparing Apples and Oranges

  • Claude Chats; ChatGPT Lectures: Pick Your Poison: Users compared Claude and ChatGPT, noting that Claude offers relatable conversation, while ChatGPT delivers in-depth philosophical insights, making it better suited for structured discussions.
  • Google's Gemini Models Playing Hard to Get: OpenRouter users grumbled about rate limiting with Google's experimental models like Gemini Pro 1.5, suspecting that Google's tight restrictions are causing connectivity woes.
  • GPT-4 Can't See Your Images, And Users Aren't Happy: Frustrations mounted as GPT-4 repeatedly failed to process images, returning errors like "I currently can't view images directly", hindering tasks like generating accurate image captions.

Theme 5. Fine-Tuning the Future: Efficient AI is In

  • Equivariant Networks Prove Their Worth in Data Efficiency: Research showed that equivariant networks improve data efficiency in rigid-body interactions, outperforming non-equivariant models, especially when data is limited.
  • ThunderKittens Could Use Some Auto Optimization Love: An auto optimizer was proposed for ThunderKittens to maximize its write-once-run-many-times potential, inspired by similar DSL experiences.
  • Mixed Precision Inference: Precision Checking Gets Tricky: Developers delving into mixed precision inference with vLLM discussed challenges in verifying kernel execution precision, noting limitations in current profiling tools.

PART 1: High level Discord summaries

Cursor IDE Discord

  • Cursor IDE Update Issues: Users have reported issues with the latest Cursor changelog, specifically the Composer not applying changes and the missing 'Apply' button, causing functionality frustrations.
    • Additionally, several users noted the removal or inconsistent performance of long context usage in chat since the recent update.
  • Composer vs Chat Mode Comparison: In Cursor IDE, users are contrasting Composer mode, which directly modifies files, with Chat mode that offers inline changes, discussing their limitations and functionality differences.
    • There's a demand for improved integration between the two modes, such as efficiently transferring discussions from Chat to Composer.
  • Windsurf vs Cursor IDE: Users are exploring Windurf as a potential competitor to Cursor IDE, noting its effective handling of terminal output and codebase search.
    • While Windurf shows promise, Cursor maintains strengths in specific workflows; however, experiences between the two vary among users.
  • API Key Limitations in Cursor IDE: Discussions highlight limitations in Cursor's API usage, with some users opting for their own API keys to gain more flexibility.
    • The community is seeking improved management of API call limits and enhanced context gathering capabilities for active projects.
  • Context Management in Cursor: Users have expressed dissatisfaction with the current context handling in Cursor IDE, particularly concerning limitations with Claude.
    • The community is advocating for better context management features and consistency to improve their coding workflows.


OpenAI Discord

  • Anthropic's MCP Framework Unleashes Claude as API: Anthropic released the new MCP framework, enabling Claude to run servers and effectively transforming the Claude app into an API.
    • This development allows Claude to create, read, and edit files locally, sparking excitement among users about real-time interaction with tools like VSCode.
  • Gemini's Response Constraints Compared to ChatGPT: Gemini often refuses innocent questions for perceived moral reasons, whereas ChatGPT is seen as more lenient in its responses.
    • Users humorously highlighted instances where Gemini declined to discuss artificial intelligence, avoiding engagement in sensitive topics.
  • Claude 3.5 Sonnet Emerges as Image Captioning Alternative: Due to persistent issues with OpenAI's vision capabilities, users recommend switching to Claude 3.5 Sonnet for image captioning tasks.
    • Community members noted that Claude 3.5 Sonnet offers more reliable functionality, helping users avoid project delays.
  • Speech-to-Text Feature Integration for ChatGPT on Windows: A user inquired about implementing a speech-to-text feature for ChatGPT on Windows, with suggestions to use the built-in Windows accessibility feature by pressing Windows + H.
    • This approach provides a real-time solution for converting speech to text while interacting with ChatGPT.
  • Structured Output Errors Linked to 'Strict' Misplacement: Users reported encountering random 'object' wrappers when using structured outputs, which was traced back to incorrect placement of the 'strict' setting.
    • After extensive debugging, it was confirmed that misplacing 'strict' led to the persistent structured output errors.


aider (Paul Gauthier) Discord

  • QwQ Model Configurations Negotiated: Users debated deploying the QwQ model in architect mode alongside a standard model for code commands, seeking clarity on interchangeability.
    • Aider facilitates model definitions across projects, boosting flexibility Advanced model settings.
  • DeepSeek-R1 Sets New Benchmarks: DeepSeek-R1 achieved exemplary results on the AIME & MATH benchmarks, underlining its open-source availability and real-time reasoning.
    • Community members hope for DeepSeek to release model weights for integration in ensemble frameworks with QwQ.
  • Optimizing Aider's Local Model Settings: Members collaborated on configuring .aider.model.metadata.json and .aider.model.settings.yml files to define local models within Aider.
    • Choosing the edit format to 'whole' or 'diff' significantly affects response structuring and editing efficiency.
  • OpenRouter Challenges Impact Aider: Participants identified issues with OpenRouter affecting model detection and functionality when using local servers.
    • Concerns were raised about spoofed implementations potentially altering model outputs and behaviors.
  • Ensemble Frameworks with QwQ and DeepSeek: A user expressed intent to integrate QwQ and DeepSeek models within ensemble frameworks to enhance reasoning capabilities.
    • This approach aims to leverage the strengths of both models for improved performance.


Unsloth AI (Daniel Han) Discord

  • Fine-Tuning Considerations in Unsloth: Users debated the merits of instruct versus non-instruct fine-tuning, recommending base models for datasets with over 1k records and suggesting experimenting with instruct models for datasets around 70k records.
    • Guidance was provided to refer to Unsloth Documentation for dataset formatting rules, emphasizing compliance for effective fine-tuning.
  • Data Privacy Measures in Unsloth: Unsloth was confirmed to maintain data privacy by not transferring data externally during fine-tuning, relying on the user's chosen platform like Google Colab.
    • This assurance addressed concerns regarding compliance with strict data privacy policies among users handling sensitive information.
  • RAG Compute Cost Challenges: Discussions highlighted that retrieval-augmented generation (RAG) can lead to high compute costs due to extensive context length requirements, as outlined in Fine-Tuning or Retrieval? Comparing Knowledge Injection in LLMs.
    • Users are navigating the balance between performance and efficiency, especially for knowledge-intensive tasks, as supported by findings where RAG surpasses fine-tuning.
  • LLama 3.1 OOM Error Solutions: Experiencing out of memory (OOM) errors during continual pretraining of LLama 3.1 8B model led to suggestions for using a bigger GPU, reducing the dataset size, or decreasing the batch size.
    • These strategies aim to mitigate memory issues and ensure smoother training processes for large-scale models.
  • Latent Paraphraser Architecture Enhancements: A latent paraphraser was explained as a modification to the transformer architecture, adding a layer to redistribute probabilities over tokens.
    • This enhancement improves input grounding and reduces noise by minimizing unseen tokens during processing.


Perplexity AI Discord

  • Perplexity Pro Holiday Discount: The Perplexity Team announced a 75% off promotion for the first month of Perplexity Pro until Monday, December 2 at 11:59pm PT, enabling new users to access advanced features including enhanced search and file uploads.
    • This offer also includes one-click shopping and free shipping through Buy with Pro, aimed at streamlining the shopping experience for users during the holiday season.
  • Integration of Perplexity with Claude: Users inquired about integrating Perplexity within Claude using the new MCP feature, similar to its functionality with Brave and GitHub, to enhance performance by utilizing Claude's Project Knowledge.
    • Additionally, there were questions regarding the possibility of integrating Google within Claude, highlighting user interest in leveraging search functionalities.
  • Perplexity Image Generation Features: The platform's image generation capabilities were discussed, with confirmation that it is available via computer online without additional charges.
    • Users explored the extent of these features, considering their accessibility and potential applications in various projects.
  • RBAC vs ABA Access Control Models: A member sought clarification on the difference between RBAC (Role-Based Access Control) and ABA (Attribute-Based Access Control) systems.
    • This discussion underscores the need for understanding access control models in technological implementations.
  • Custom Instructions in Claude Spaces: Issues were raised about the effectiveness of custom instructions for Claude spaces, which appear to conflict with existing 'introduce yourself' prompts.
    • Users are seeking guidance on how these instructions should interact and whether they can be effectively combined.


LM Studio Discord

  • HF Search Issue Resolved: The HF search not working issue has been resolved, much to the relief of users.
    • An image was attached to commemorate the fix, indicating a positive update for the community.
  • LM Studio AIDE Integration Succeeds: Users successfully integrated the LM Studio endpoint to the AIDE sidecar, enabling a fully local code editor experience.
    • This integration enhances functionality for those seeking a local development environment.
  • Llama 3.1 Models Accessibility: A user inquired about accessing the base model of Llama 3.1 8B in LM Studio, noting that only instruction-tuned variants seem available.
    • Community members pointed to the huggingface repository as a potential source for the base model.
  • a770 Underperforms Compared to 7800xt: A member shared that their a770 achieved only 11t/s for Qwen2.5-14b q4_0, significantly lower than the 40t/s achieved by a 7800xt.
    • They noted q4_k_m is unusable but found sycl backend to be negligibly faster.
  • Seasonic PSU Longevity Praised: A member mentioned their Seasonic PSU outlived other PC components despite having to replace PSUs every couple of years due to dust.
    • They described their experience as amazingly satisfactory with the PSU's performance.


Eleuther Discord

  • De-escalation of Resource Contention: Members highlighted concerns about the de-escalation of resource contention and its impact on unregulated internet growth, questioning the effectiveness of AI-powered privacy solutions. They emphasized the importance of identifying warning signs of rogue AI attacks to protect vulnerable devices.
    • The discussion stressed the need for community leadership in AI protection to mitigate the risks associated with resource contention and unauthorized AI activities.
  • Poincare Ball Embedding Explained: Embedding data into a Poincare ball ensures that points with higher degrees reside closer to the origin, preserving adjacency while transitioning to regions with less curvature. This method facilitates the representation of complex hierarchical structures.
    • A member pointed out the conceptual challenge of the Poincare ball's edge, noting that it represents a point at infinity where points cannot physically reside, which sparked further technical discussion.
  • Equivariant Networks Gain Efficiency: A recent paper found that equivariant networks enhance data efficiency compared to non-equivariant networks across various model sizes and compute budgets. The study demonstrated that equivariant models consistently outperform their non-equivariant counterparts.
    • Empirical results indicated that while non-equivariant models can match the performance of equivariant ones with sufficient training, equivariant networks offer superior efficiency without requiring extensive compute resources.
  • Understanding HF Tokenizers in Eval Harness: There’s confusion about whether the eval harness tokenizes sequences with add_special_tokens=True or False, particularly regarding the handling of EOS tokens during generation tasks. Members clarified that typically, only BOS tokens are added when building custom tokenizers.
    • Discussions revealed that manually managing the EOS token in the training loop is a practical approach to avoid compatibility issues across different frameworks utilizing HF models.
  • TaskSet Empowers Optimizer Training: The TaskSet dataset, containing over a thousand diverse tasks, is instrumental for training and evaluating optimizers in meta-learning contexts. This dataset enables significant efficiency improvements over traditional random search methods.
    • Although recognizing that TaskSet is somewhat outdated, members acknowledged it as the best available option for building large datasets of learning curves despite financial constraints in AutoML research.


OpenRouter (Alex Atallah) Discord

  • Feature Requests Voting: Members are urged to vote for their top feature requests here to prioritize upcoming developments.
    • For any unlisted requests, users can submit them in <#1107397803266818229>, enabling a wider array of community-driven feature inputs.
  • Pixtral Large Performance: Pixtral Large is praised for its excellent performance and a massive free tier, facilitating easy access via console.mistral.ai.
    • A user reported switching from Hermes 405b to Pixtral, noting its effectiveness with unchanged prompts.
  • Model Identification Confusion: Discussions highlighted that models do not inherently recognize their identities and often hallucinate details from training data.
    • This led to lingering confusion among users about model identifications despite clarifications.
  • Generation Cost Estimation: A user inquired about rates for the /api/v1/generation endpoint and methods to accurately estimate generation costs.
    • Suggestions included utilizing Helicone for tracking, emphasizing that the generation endpoint is essential for precise cost assessment.
  • Custom Provider Keys Access: Developers are pushing for access to custom provider keys, reflecting a strong community demand for this feature. One member noted, 'Thank you for all the great work!' while requesting access.
    • Several users, including monomethylhydrazine and kit18, expressed the need to use their own keys for specific providers, highlighting a community consensus on this functionality.


GPU MODE Discord

  • Triton Metaprogramming and Source Build: A metaprogramming proposal for Triton aiming to address existing limitations has generated community interest, though some members requested clearer semantics and example inclusions.
    • Additionally, building Triton from source on WSL2 required increasing memory to 26GB to prevent out-of-memory errors, and members discussed offline compilation dependencies in Ubuntu Docker containers.
  • ThunderKittens and ThunderMittens Unification: Discussions around ThunderKittens and ThunderMittens highlighted the role of tile abstraction in unifying the frameworks for tensor core compatibility, with emphasis on register usage control.
    • Members also inquired about existing API contracts between the two, and expressed interest in an auto optimizer for ThunderKittens to enhance its write-once, run-many-times system.
  • BitNet b1.58 with RedPajama and Dolma Datasets: The release of BitNet b1.58 models, trained on the RedPajama dataset with 100B tokens, demonstrated promising PPL and zero-shot accuracy results.
    • Furthermore, the OLMo-Bitnet-1B model, trained on 60B tokens from the Dolma dataset, underscores the research-centric approach with detailed training hyperparameters available in their documentation.
  • Diffusion Models Technical Overview: Recent discussions on diffusion models emphasized their dominance in generating perceptual signals, citing improved mode coverage and faster sampling as key advantages.
    • Implementation of classifier-free diffusion guidance was highlighted for enhancing conditional diffusion model outputs in systems like OpenAI’s DALL·E 2 and Google’s Imagen, with noise schedule design elements being pivotal for performance.
  • Open Japanese LLM Leaderboard Launch: The introduction of the Open Japanese LLM Leaderboard aims to evaluate Japanese LLMs across 20+ datasets and tasks in collaboration with Hugging Face.
    • This initiative addresses the lag in Japanese LLM performance compared to English, garnering interest from Japanese HPC engineers focused on native language advancements.


Nous Research AI Discord

  • Hermes 3 Advances with O1 Style Integration: A discussion in #general highlighted inquiries about Hermes 3, suggesting connections to the former O1 style.
    • This reflects ongoing interest in Hermes' latest developments and its evolution within the community.
  • Mistral Platform Faces Model Selection Hurdles: Members voiced concerns regarding the Mistral AI platform's recent change to default to a single model selection option.
    • The limitation on image generation capabilities has caused confusion and impacted user experience.
  • Truth Terminal Merges AI with Crypto Narratives: Insights were shared about Truth Terminal creating its own religion through a semi-autonomous AI within the crypto space.
    • This unique blend underscores the intersection of AI alignment discussions and the AI and crypto communities.
  • Low-bit Quantization Benefits Undertrained LLMs: Research indicates that low-bit quantization results in less degradation for larger, undertrained LLMs compared to smaller, extensively trained models, as detailed in this paper.
    • The findings emphasize the importance of aligning quantization strategies with model size and training token requirements.
  • Ternary Quantization Limited, FP4 Emerges as Efficient: Observations reveal that ternary quantization (BitNet) only improves results for undertrained networks, questioning its broad applicability.
    • Consequently, the community is leaning towards FP4 as the preferred numeric weight representation for current model architectures.


Modular (Mojo 🔥) Discord

  • Confusion Over Mojo Origins vs Rust Lifetimes: A user expressed confusion about how Mojo's Origins are similar to Rust's lifetimes, suggesting both aim to solve memory management issues but are fundamentally different.
    • While inspired by Rust, Mojo's design is intentionally distinct, aiming for different compiler behaviors and goals.
  • Mojo Origins Maintain Memory Control: Mojo's Origin denotes a memory chunk; when a pointer is parameterized by an origin, it indicates it points within that memory, extending variable lifetimes as necessary.
    • Origins facilitate aliasing guarantees and can produce compile-time errors if a pointer remains alive while its target is not.
  • Understanding Origins Requires Patience: Understanding Mojo Origins from a compiler perspective is challenging, especially as they are not finalized, leading to potentially shifting details.
    • A user expressed willingness to wait for more clarity on the topic rather than asking more questions prematurely.
  • Namespace Challenges with Spaces in Variable Names: A question arose about the possibility of using spaces in variable names, like var xe đạp = 'abc', highlighting a lack of support across programming languages.
    • Allowing spaces complicates parser implementation significantly, making it impractical.


Notebook LM Discord Discord

  • Notebook LM Podcast Feature Creates Audio in 30 Minutes: A user praised Notebook LM's ability to create an audio podcast in just 30 minutes using documents about their German little league baseball program, including its historic World Series qualification. The podcast episode showcases the seamless integration of AI-generated content.
    • This demonstrates how Notebook LM can efficiently generate multimedia content, enhancing project workflows for users.
  • NotebookLM Enhances High-Fantasy Worldbuilding: A user shared their experience of using NotebookLM for worldbuilding a high-fantasy novel, highlighting the model's capability to provide context-aware responses.
    • The AI's reasoning skills led to new insights and mechanics for their magic system based on existing rules.
  • GenFM Challenges NotebookLM in AI Podcasting: A member shared a video titled 'GenFM, Now Playing on ElevenReader: Smart Podcasts Produced by Generative AI', highlighting competition in the AI space.
    • Despite GenFM's entry, another member noted that NotebookLM still provides deeper interactive experiences.
  • RAX's Bold Times Square Billboard Takeover: RAX, a cyberpunk raccoon, commandeered Times Square billboards to advocate for mindful consumption with the message: 'DON'T BUY EVERYTHING YOU SEE.' A YouTube video discusses the event emphasizing the need to question consumer culture.
    • This digital performance sparked discussions on consumerism within the community.
  • FDP Plans Coalition Breakup in Germany: The FDP is planning to break up the coalition government led by Chancellor Gerhard Schröder, outlining a strategy to frame their exit as necessary for political progress.
    • Internal documents provide key narratives and timelines to ensure the German public receives a clear choice in upcoming elections.


Latent Space Discord

  • Perplexity's Clever Black Friday Campaign: Perplexity launched a clever Black Friday campaign that aligns with recent marketing trends leveraging AI capabilities.
    • This initiative has garnered attention for its strategic integration of AI in marketing strategies.
  • Humans Outperform AI in Pattern Recognition: Consensus among members indicates that while AIs compute faster, humans excel at recognizing global patterns in complex problems, often reacting with phrases like 'hang on a sec, this isn't right'.
    • This ability to identify overarching inconsistencies sets humans apart from AI systems that may fixate on specific local issues.
  • Generative AI Investment in Enterprises: A recent report highlights that AI spending surged to $13.8 billion in 2024, signifying a shift from experimental use to core business strategies.
    • Despite the increase in investment, over a third of decision-makers are still developing effective methods for integrating generative AI into their operations.
  • Freysa AI Agent Challenge Funds Released: An AI challenge led to the Freysa agent transferring $47,000 through a cleverly crafted prompt that bypassed strict transfer instructions.
    • This event underscores the complexities of prompt engineering for AI manipulation within financial transactions and showcases transparent, open-source setups.
  • Technology Adoption and Investment Trends: Participants compared current LLM trends to historical technological shifts, noting parallels in excitement and potential market corrections.
    • The ongoing discussion raises concerns about the sustainability and future profitability of AI technologies, echoing patterns seen in industries like aviation.


Stability.ai (Stable Diffusion) Discord

  • ControlNet for SD 3.5 Quality Issues: A member reported that ControlNet for SD 3.5 only produces high-quality renders at 1024x1024 resolution without artifacts.
    • Another member attributed the issues to lack of familiarity and encouraged experimenting to better understand ControlNet's functionality.
  • Stable Diffusion Hardware Performance: A user inquired about performance benchmarks for Stable Diffusion, mentioning an achievement of approximately 5 IT/s.
    • Community members actively shared their hardware capabilities, reflecting keen interest in optimizing setups for Stable Diffusion.
  • LoRA Model Request for AI Art: A user requested information about a LoRA half girl model to create characters merging two different female designs.
    • This request highlights ongoing experimentation and creativity in character development within AI-generated art.
  • Content Creator Thanksgiving Wishes: A member extended Happy Thanksgiving wishes to the Stability.ai team and fellow creators.
    • This gesture underscores the camaraderie and collaborative spirit among content creators in the AI space.


tinygrad (George Hotz) Discord

  • TinyFPGA's Potential Memory Architecture: Members discussed the design of TinyFPGA, contemplating how to mimic a typical memory hierarchy while noting that existing options like Block RAM and DDR3 are insufficient.
    • Ideas were proposed for a 'first pass' memory to localize constants near ALUs, potentially enhancing performance significantly.
  • Challenges in Traditional Memory Models: Discussions highlighted that heuristic eviction policies may become obsolete as the focus shifts towards more efficient memory hierarchies in future TinyFPGA designs.
    • Speculations were made about the future of trained parameters, with mentions of tensors potentially replacing them.
  • Exa Laboratories Sustainable Chip Designs: A conversation on Exa Laboratories emphasized their mission to create reconfigurable chips that outperform traditional GPU/TPU in speed and energy efficiency for specific AI needs.
    • Skepticism was expressed regarding their viability, pointing out the challenges small companies face in chip development, especially with ambitious timelines.
  • Tenstorrent's Biologically Plausible Training Algorithms: George Hotz mentioned Tenstorrent as a serious player investing in training algorithms that mimic biological processes to achieve greater efficiency.
    • Potential changes include hierarchical memory models and real-time optimizations reminiscent of brain function principles in computing.
  • VIZ Tool in tinygrad: A member posted a detailed tutorial explaining the VIZ tool, available here, enhancing understanding of its capabilities within tinygrad.
    • George Hotz acknowledged the VIZ tool in a tweet, stating that VIZ=1 is a significant improvement over LLVM/MLIR, highlighting its advantages.


Cohere Discord

  • Aya Project Contributions Guidance: A member sought guidance on contributing part-time to the Aya project for Cohere.
    • Another member suggested joining the Aya server to connect with the community directly.
  • Thanksgiving Celebrations and Meal Sharing: Members shared Happy Thanksgiving messages and images of their meals, including one member's impressive plate of food.
    • Another member humorously commented on trying to eat healthy, noting that it wasn't as tasty as it could be.
  • Food Sharing and Dungeness Crab: Members exchanged comments and images of their hearty meals, with one joking that their meal was more like dessert.
    • A humorous remark followed about having eaten a plate of Dungeness crab beforehand, enhancing the food sharing atmosphere.


DSPy Discord

  • dspy.asyncify support concerns: A member inquired about using dspy.asyncify, specifically its use of threads and the availability of pure async support due to issues with celery workers.
    • Another user echoed the desire for pure async support to address the existing celery worker issues.
  • dspy demo behavior with assertions: Concerns were raised about dspy not using demos in the final prompt when assertions are activated.
    • A member clarified that demonstrations in retry mode depend on whether compilation occurred before or after activating assertions.
  • Welcome Shaun to the guild: Shaun joined the server, greeted everyone, and expressed excitement about ongoing projects.
    • The community welcomed Shaun, fostering an inclusive environment.


Torchtune Discord

  • DPO Aligns Across Repositories with LoRA-DPO: The DPO Trainer from Hugging Face shows that while the code differs, the DPO technique remains consistent across repositories like LoRA-DPO.
    • This consistency ensures that implementations maintain alignment, facilitating easier integration and comparison between different DPO approaches.
  • Feasibility of Full-parameter DPO: Implementing full-parameter DPO is achievable and may enhance post-training alignment compared to LoRA-DPO.
    • The community recommends leveraging adaptations from the existing full PPO implementation to guide this process.
  • Introducing dpo_full_finetune_single_device PR: A new PR adds full finetuning DPO for distributed setups, serving as a solid foundation for single device implementation.
    • Details can be accessed through the full DPO PR, which outlines the proposed changes and enhancements.
  • Torchtune to Support Full-finetuning DPO: Upcoming updates in Torchtune will support full-finetuning DPO, necessitating modifications to load a separate reference model.
    • These changes involve altering initial calls to the reference model to improve functionality and integration within the existing framework.
  • Higher Memory Usage in FFT DPO: FFT DPO will consume significantly more memory than LoRA due to the necessity of storing gradients and maintaining a complete model copy.
    • If LoRA DPO does not meet performance requirements, the tradeoff in memory usage for adopting full-finetuning DPO may be justified.


LLM Agents (Berkeley MOOC) Discord

  • Quiz 11 Still Not Open?: A member expressed confusion about the status of Quiz 11, questioning why it isn't available yet.
    • Is there an expected date for when it will be open?
  • Inquiry on OpenAI Credits: A user inquired about the status of their OpenAI credits, mentioning they filled out the form last week.
    • They expressed urgency, stating they are in need of support for their project development.
  • MOOC Completion and Certificate Eligibility: A member asked if starting the MOOC now would still allow them to receive the certificate after completion.
    • They were also curious if it's feasible to finish all requirements within the remaining time.


OpenInterpreter Discord

  • Open Interpreter Dashboard Development: A member announced they're developing an Open Interpreter inspired project focused on creating an open-source dashboard to be released this year.
    • The project emphasizes being a fun little project without any profit motive.
  • Community Support for Dashboard Project: Another member congratulated the project creator, expressing enthusiasm with 'Nice work! Well done 🚀'.
    • This exchange highlighted the community's encouragement for innovative projects within the space.


Interconnects (Nathan Lambert) Discord

  • OLMo 2 Performance Boosts Prowess: The OLMo 2 family, comprising 7B and 13B models from Allen AI (AI2), was trained on up to 5T tokens and outperforms Llama-3.1 8B and Qwen 2.5 7B.
    • Key enhancements include an improved architecture with RMSNorm and QK-Norm, along with a comprehensive two-stage curriculum training approach.
  • OLMo 2 Crafts Cutting-Edge Training: OLMo 2 employs the model souping technique for final checkpoints and adopts a post-training methodology inspired by Tülu 3 involving instruction tuning, preference tuning with DPO, and reinforcement learning with verifiable rewards.
  • Instruct OLMo 2 Tops Open-Weight Models: The 13B Instruct variant of OLMo 2 surpasses Qwen 2.5 14B and Tülu 3 8B in instruct tasks, as validated by the OLMES suite.
  • Weight Watcher AI Gains Meme-worthy Attention: Weight Watcher AI was highlighted as a novel addition to the AI landscape and humorously shared in the memes channel, drawing attention for its amusing nature.
    • The OLMo summary link was shared, though no description was found.


LlamaIndex Discord

  • Developer Skills Showcase: A member shared an extensive list of development skills including React, Next.js, Angular, and D3.js, highlighting their experience with UI/UX and testing frameworks like Protractor and TestCafe.
    • This diverse skill set underscores their adaptability across front-end and testing technologies, enhancing their capability to tackle complex engineering challenges.
  • Diverse Technology Stack: The developer mentioned a wide range of technologies such as Node, Nest.js, Solidity, and Rust, including knowledge of front-end frameworks like Bootstrap and styling methodologies like BEM and SMACSS.
    • This comprehensive technology stack enables efficient integration and development across various platforms and frameworks, catering to multifaceted project requirements.
  • API Integration Expertise: They expressed familiarity with integrating multiple APIs including Google Maps, YouTube, and Facebook APIs, allowing them to work on diverse projects that require efficient data interaction.
    • Their ability to manage and implement diverse API integrations facilitates robust and scalable solutions in system architectures.
  • Cloud Deployment Skills: The member highlighted AWS among their cloud service competencies, enabling effective deployment of applications into cloud environments.
    • Proficiency in AWS ensures reliable and scalable cloud deployments, optimizing resource management and infrastructure performance.
  • Call for Collaboration: They concluded with an invitation to connect, promoting potential networking opportunities within the developer community.
    • This outreach fosters professional collaboration and knowledge sharing among engineers with similar technical interests.


The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Axolotl AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The LAION Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Mozilla AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The HuggingFace Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Gorilla LLM (Berkeley Function Calling) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: !

If you enjoyed AInews, please share with a friend! Thanks in advance!

Don't miss what's next. Subscribe to AI News (MOVED TO news.smol.ai!):
Share this email:
Share on Twitter Share on LinkedIn Share on Hacker News Share on Reddit Share via email
Twitter
https://latent....
Powered by Buttondown, the easiest way to start and grow your newsletter.