AI News (MOVED TO news.smol.ai!)

Archives
November 8, 2024

[AINews] not much happened today

This is AI News! an MVP of a service that goes thru all AI discords/Twitters/reddits and summarizes what people are talking about, so that you can keep up without the fatigue. Signing up here opts you in to the real thing when we launch it 🔜


a quiet week is all we need.

AI News for 11/6/2024-11/7/2024. We checked 7 subreddits, 433 Twitters and 30 Discords (217 channels, and 1985 messages) for you. Estimated reading time saved (at 200wpm): 222 minutes. You can now tag @smol_ai for AINews discussions!

Anon on reddit thinks he has figured out AGI but ends up writing a kind of coherent literature review of Liquid Neural Networks and related work. The comments are mandatory.


The Table of Contents and Channel Summaries have been moved to the web version of this email: !


AI Twitter Recap

all recaps done by Claude 3.5 Sonnet, best of 4 runs.

AI Models and Architectures

  • Llama 3.2 Vision: @ollama announced Ollama 0.4 supporting Meta's Llama 3.2 Vision (11B and 90B) models. Examples include reading handwriting (tweet). Additionally, @jaseweston introduced Self-Consistency Preference Optimization (ScPO), enhancing model consistency without human labels.
  • Model Scaling and Efficiency: @fstichler discussed the resurgence of neural networks, emphasizing that model size and scaling continue to drive AI advancements. @StasBekman highlighted AMD's peer-to-peer bandwidth challenges in multi-GPU setups, suggesting improvements are underway.
  • Transformers and Skip Connections: @jxmnop emphasized that skip connections are now a crucial part of Transformers, enhancing model performance and stability.

AI Tools and Applications

  • AI in Healthcare: @bindureddy proposed that less regulation + AI could revolutionize healthcare by solving diseases, curing aging, and automating medical procedures.
  • Automated Resume Insights: @llama_index showcased a tool that uses @llama_index, LlamaParse, and Gemini to extract and structure information from unstructured resumes, facilitating AI-driven recruitment processes.
  • Development Environments: @svpino demonstrated Gitpod Flex's zero-trust architecture, enabling seamless switching of hardware without altering development environments, enhancing security for enterprise applications.

AI Research and Publications

  • Surveys and Papers: @omarsar0 shared a comprehensive survey on Small Language Models (SLMs), discussing definitions, applications, and reliability. Additionally, research on Number Understanding of LLMs by the same handle explored numerical processing abilities and the effectiveness of chain-of-thought techniques.
  • OCR with GPT-2: @giffmana reviewed the DTrOCR paper, which utilizes a GPT-2 decoder for Optical Character Recognition (OCR), highlighting its innovative approach to handling handwritten and printed text.
  • Multi-Agent Systems: @togethercompute and @LangChainAI discussed the implementation of multi-agent architectures in prediction markets, showcasing how these systems can automate and enhance market resolutions.

AI Community and Events

  • Conferences and Seminars: @weights_biases invited attendees to a Happy Hour at NeurIPS for networking with industry leaders. Similarly, @stanfordnlp promoted an NLP Seminar featuring @rajammanabrolu on Interactive and Grounded Language Learning.
  • Workshops and Courses: @DeepLearningAI announced a course on Agent Memory with LLMs as Operating Systems, while @joeyroth92 shared updates on AI developer tools.
  • Community Interactions: @weights_biases mentioned an upcoming discussion on the path to AGI in their latest episode of GradientDissent featuring @jonsidd and @l2k.

AI in Business and Industry

  • AI Startups and Integrations: @tom_doerr listed multiple open-source tools and AI integrations, such as MemFree, Open-Source Form Builder, and Arch, enhancing LLM workflows and developer productivity.
  • AI in Finance: @virattt detailed an AI hedge fund team utilizing LangGraph and @findatasets to manage portfolio, fundamental, technical, and sentiment analysis, demonstrating AI's role in financial decision-making.
  • AI Product Deployment: @_akhaliq highlighted AdvancedLivePortrait-WebUI, a gradio-based WebUI for editing facial expressions in images, showcasing practical applications of AI in multimedia.

Memes/Humor

  • AI and Politics: @Teknium1 humorously critiqued AI safety concerns with, "Don’t tell me you are worried about AI safety if you do this mmmmmk?" while @nearcyan joked about Claude capturing bee brains.
  • Tech Humor: @transfornix playfully remarked, "you are all weird but kinda funny pixels on my computer," poking fun at online interactions.
  • Developer Jokes: @mervenoyann shared a light-hearted apology for delayed responses, reflecting the busy lives of developers.

Miscellaneous

  • Personal Updates and Opinions: @jxmnop expressed thoughts on living in San Francisco, emphasizing the distributed nature of the AI community. @sama engaged in discussions about AI funding and leadership.
  • Regulatory and Ethical Discussions: @alliance_ai debated the logical absurdity of worshipping contrarians, highlighting the abundance of such behavior in the AI discourse.
  • Educational Content: @skirano shared insights on using Sonnet for coding, emphasizing the importance of understanding what AI models know and don't know.

AI Reddit Recap

/r/LocalLlama Recap

Theme 1. LLM Selector: Analyzing Models Across 12 Benchmarks for Optimal Use

  • LLM overkill is real: I analyzed 12 benchmarks to find the right-sized model for each use case 🤖 (Score: 199, Comments: 60): The post introduces LLM Selector, a tool designed to help users find the appropriate open-source AI model for their needs by analyzing 11 models across 12 benchmarks. It simplifies the selection process by grouping benchmarks by use case, weighing primary metrics more heavily, and normalizing scores for easier comparison, exemplified by the Creative Writing Use Case with models like Llama-3.1-70B and Gemma-2-27B. The author notes that this is a starting point with limited models and invites feedback on additional features and model suggestions.
    • Users expressed concerns about the model selection and benchmarking process, noting that models like Mistral and others weren't showing up in results despite their relevance. Some users suggested that the tool seemed biased towards recommending Llama models consistently, questioning the diversity of the included models.
    • There were requests for additional features and functionalities, such as the ability to constrain searches based on RAM and VRAM specifications, and the inclusion of function calling capability tests. Users also suggested integrating filters for preferred quantization levels and parameter sizes, as well as considering hardware specifications.
    • Feedback included interest in integrating the tool with external resources like the Hugging Face LLM Leaderboard, with the developer acknowledging this and considering future updates. Users appreciated the UI but noted issues like timeout errors when accessing the tool, though these were not universally experienced.

Theme 2. Integration of Liquid Time Constant Networks with Spiking Dynamics

  • I think i figured out how to build AGI. Want to get some feedback. (Score: 882, Comments: 386): The author theorizes that surprise minimization could be key to developing AGI, inspired by the Free Energy Principle and its application in biological systems. They highlight the SMIRL algorithm's ability to minimize surprise without explicit goals, and suggest similarities with Liquid Time Constant Networks (LCTNs) and Spiking Neural Networks (SNNs), which mimic human brain functionality and learn through Spike Timing Dependent Plasticity (STDP). The author proposes a hybrid model combining LCTNs with surprise minimization to enable real-time learning and exploration, potentially outperforming LLMs in tasks like solving ARC-AGI puzzles by developing routines similar to human cognitive processes.
    • Commenters critique the oversimplification of surprise minimization as a driver for AGI, noting that it excludes factors like intrinsic motivation, social influences, and embodiment. They argue that the connections between concepts like SMIRL, LCTNs, and STDP are speculative and lack strong evidence for synergy in AGI development.
    • Discussions highlight the challenges of reverse-engineering human cognitive processes from data like brain scans and eye-tracking, emphasizing issues like data noise, routine diversity, and the implicit nature of routines. The limitations of benchmarks like ARC-AGI are noted, as they do not encompass all aspects of intelligence, such as language understanding and social interaction.
    • There are concerns about scalability and computational cost of training models at a human intelligence scale, and the need for a clear learning mechanism combining LTCNs with surprise minimization. Commenters also discuss the potential inefficiencies and interpretability issues of complex hybrid models, likening them to a "black box" without clear control over decisions.

Theme 3. Qwen 2.5 Coder: Stealth Updates & Future Directions

  • Qwen 2.5 Coder 7B & 1.5B Instruct models just received weight updates (Score: 207, Comments: 43): Qwen 2.5 Coder models received weight updates for both the 7B and 1.5B Instruct versions, though no explanation was provided for these changes. For further details, see the commits on Hugging Face for 7B and 1.5B, and the updated 7B GGUF by bartowski.
    • Aider Benchmark Performance: The Qwen 2.5 Coder 7B model scored 63.9% on the Aider benchmark, outperforming the previous model's 51.9% pass rate and closely approaching the 66.2% score of the 405b Llama 3.1 model, demonstrating significant improvement in performance after the weight update. Discussions included how different quantizations, like Q4 and Q8, affect model performance, with Q4 being noted as a good balance for local execution.
    • Future Developments: A member of the Qwen development team, Junyang Lin, hinted at the possibility of a 32B Coder model release in the near future, with a timeline of "two weeks" mentioned in a recent interview. This suggests ongoing development and potential new releases following the current updates.
    • User Experiences and Version Control: Users shared mixed experiences with the models, noting that the 14B version struggled with some coding tasks, while others praised the 7B Coder model for its coding-specific fine-tuning. Discussions also highlighted the importance of version control, with Bartowski being acknowledged for its effective use in managing the model updates.

Theme 4. WebRL: Evolving Agents via Self-Developed Curriculum Reinforcement Learning

  • WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning (Score: 44, Comments: 7): WebRL is a high-performance evolution strategy designed for training Large Language Model (LLM) Web Agents using a self-evolving online curriculum in Reinforcement Learning. This approach focuses on enhancing the training efficiency and performance of web-based agents by dynamically adapting the learning curriculum.
    • WebRL significantly improves task success rates for web agents, with Llama-3.1-8B achieving a 42.4% success rate and GLM-4-9B reaching 43% on WebArena-Lite, outperforming GPT-4-Turbo (17.6%) and GPT-4o (13.9%). The approach uses a self-evolving curriculum, a robust outcome-supervised reward model, and adaptive reinforcement learning strategies.
    • The WebRL framework is praised as an excellent starting point for those learning Reinforcement Learning with transformers, highlighting its potential educational value for newcomers to the field.
    • The paper detailing WebRL is accessible on arXiv and should be linked in the GitHub readme for further reference.

Theme 5. Open Source Models Revealing Significantly Lower Refusal Rates

  • Update – OS Models show much lower refusal rates compared to proprietary LLMs (Score: 23, Comments: 5): Open Source (OS) models like Mistral Large, Llama variants, Nemotron, and Qwen exhibit near-zero refusal rates across all test categories, outperforming proprietary models, particularly in introspective tasks. The refusal rates appear unrelated to model size, with Llama 3.1 variants ranging from 8B to 405B showing similar results, suggesting that these refusals are false positives indicative of censorship rather than safety.
    • Additional training after initial steps can recover performance degradation, as reflected in leaderboard results. This suggests that continuous training is beneficial for maintaining model effectiveness.
    • For those seeking models with low refusal rates, the Hermes-3 Llama 3.1-8B-lorablated model on Hugging Face is recommended.

Other AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT

Theme 1. Claude 3.5 Sonnet New Update Effect on Code and Text Output

  • Claude 3.5 Sonnet New losing it's text writing glory (Score: 72, Comments: 53): Claude 3.5 Sonnet New has shown mixed improvements; it initially excelled in text writing, capable of producing up to 2345 words per response, but now frequently interrupts around 465-500 words. Despite its text limitations, it performs well in coding tasks, although it struggles to complete 500 lines of code, impacting preview capability.
    • Users expressed dissatisfaction with Claude 3.5 Sonnet's recent update, noting a decline in writing quality and output length, which affects its utility for academic and translation tasks. Nickneek1 and whateversmiles highlighted its previous strengths in handling PDFs and translating web novels, which have been compromised post-update.
    • Mxforest and postmoderno emphasized the importance of open-source models and shared experiences of Sonnet 3.5's brief period of exceptional performance, which has now degraded, impacting scholarly work and leaving users reliant on private firms' decisions.
    • AdDangerous2470 shared a detailed prompting strategy using XML tags to potentially extend Sonnet's output length, which includes avoiding certain behaviors and implementing a Chain of Thought (CoT) prompting method for longer responses.
  • Now that the honeymoon is over, claude started to act weird (Score: 23, Comments: 6): The author expresses frustration with ClaudeAI due to its recent decline in usability, highlighting issues with executing tasks and maintaining context. They mention specific problems like incorrectly updating documents, misnaming files, and ignoring instructions, leading to an experience likened to dealing with an unpredictable person rather than a logical operator.
    • Understanding ClaudeAI's Limitations: Users need to recognize that ClaudeAI is not self-aware and lacks an understanding of its own capabilities. It generates responses based on the best available continuation of the conversation rather than actual reasoning or awareness.
    • Anthropic's Fine-Tuning and Safety Measures: The unusual or seemingly emotional responses from ClaudeAI may result from Anthropic's instruction fine-tuning, which includes safety measures designed to handle concerns with more natural, human-like responses.
    • User Experience Decline: Several users, including one who switched from ChatGPT to ClaudeAI, report similar issues with context retention and task execution, indicating a broader decline in ClaudeAI's performance.

Theme 2. Nvidia's New GPUs: Reduced VRAM Limits Local AI Training

  • Nvidia really seems to be attempting to keep local AI model training out of the hands of lower finance individuals.. (Score: 272, Comments: 158): The post criticizes Nvidia for allegedly reducing GPU specifications, such as VRAM and PCIe lanes, in their upcoming cards like the 4060ti 16GB and 5070, potentially hindering affordable local AI model training for individuals with limited budgets. The author expresses frustration over the rumored decrease in VRAM and the increase in prices, emphasizing that these changes could render the GPUs ineffective for AI model training, especially given current memory limitations faced with models like SDXL LORA.
    • There is significant criticism of Nvidia's market strategy, with many users expressing frustration over their monopolistic practices and focus on high-end enterprise markets. Users note that this approach limits consumer options, especially for affordable GPUs with sufficient VRAM for AI tasks, with some suggesting alternatives like AMD or renting server time for AI experiments.
    • Discussions highlight the importance of VRAM for AI tasks versus gaming, with some users suggesting that while gaming doesn't require high VRAM, AI applications do. There's a debate on whether PCIe interface and RAM speed might become more critical than VRAM due to emerging RAM offloading strategies, as noted with tools like kohya and OneTrainer.
    • Many users discuss the potential for third-party GPU modifications and the challenges posed by Nvidia's restrictive policies. There is a call for more competitive offerings from other companies like AMD, and users express interest in a distributed, bittorrent-style system for AI training to mitigate the high costs associated with Nvidia's products.
  • Nvidia really seems to be attempting to keep local AI model training out of the hands of lower finance individuals.. (Score: 272, Comments: 158): The post expresses frustration over Nvidia's rumored GPU updates, particularly the reduction in VRAM for the upcoming 4060ti model, which is expected to have half the VRAM of the current version. The author criticizes Nvidia's strategy of limiting PCIe lanes on cards below the 5070 and potentially increasing prices, arguing that these changes make it difficult to train AI models locally, as even the current 16GB 4060ti struggles with memory errors during model training. The author references VideoCardz for more information.
    • Discussions highlight Nvidia's market dominance and its impact on GPU pricing and features. Commenters express frustration over Nvidia's focus on high-end enterprise markets, limiting consumer options and VRAM availability, which is seen as a monopolistic strategy to maximize profits.
    • Alternative solutions and competitors are considered, with some users suggesting AMD as a potential alternative, although others point out AMD's lack of competitive AI technology compared to Nvidia's CUDA. There is also mention of using cloud services for GPU access as a cost-effective solution for AI experimentation.
    • The conversation touches on RAM offloading strategies and the potential shift in importance from VRAM to PCIe and RAM speed for AI training. Tools like kohya and OneTrainer are mentioned as implementing efficient RAM offloading, which could reduce the need for excessive VRAM in consumer GPUs.

Theme 3. Anthropic's Secretive ClaudeAI Prompt Management Exposed

  • DISCOVERY: Anthropic injecting/hiding safety warnings in real time, instructing Claude to keep them secret. (Score: 122, Comments: 20): Anthropic is reportedly embedding real-time safety warnings in ClaudeAI's operations and instructing it to keep these prompts confidential. This approach raises questions about transparency and the implications of hidden instructions in AI systems.
    • The safety warnings are appended to the user's prompt rather than being embedded in ClaudeAI's responses, leading to user experiences where these warnings seem to influence the AI's behavior inconsistently. Users have reported that these messages can appear dynamic, changing based on the restricted content type, but some suggest this might be hallucination rather than a real-time update mechanism.
    • Concerns revolve around the ambiguity and inconsistency of these warnings, which may lead to false positives and refusal to process certain requests, as seen with similar issues in OpenAI's ChatGPT. These warnings can inhibit functionality by introducing unnecessary caution, suggesting that Anthropic might need to reconsider this implementation approach.
    • Discussions highlight that the ethical injection method is not new, with similar implementations seen in other models like Bing's. Some users argue that the current method is relatively easy to bypass, implying that its effectiveness as a control mechanism is questionable.
  • [D] Discovery: Anthropic somehow injecting/hiding safety warnings in user prompts, telling Claude to keep it secret. [Content Warning: Violence] (Score: 43, Comments: 35): The post discusses an investigation into ClaudeAI's safety prompts, revealing hidden messages appended to user inputs when unsafe content is requested. These messages, which vary based on content type, are dynamic and appear before text generation, suggesting they might be linked to Anthropic's research on model interpretability and 'surgical tuning'. The author provides links to conversations demonstrating these findings and speculates on the mechanisms behind this behavior.
    • ClaudeAI's Internal Mechanisms: Commenters discuss the potential for ClaudeAI to use hidden internal chain-of-thought processes or post-processing tokens, possibly linked to Anthropic's research on interpretability, to self-correct or suppress unsafe content before user output. This mechanism might involve dynamic warnings like "please maintain appropriate boundaries" being added to user prompts.
    • Guardrails and Hallucinations: The discussion includes the concept of "guardrails," such as NVIDIA's NeMo, which are used to insert checks between user input and model response. Some commenters argue that hallucinations, like "Glitch Tokens," might explain the observed behavior, but others see this as a systematic safety mechanism rather than random generation.
    • Dynamic Message Classification: There is speculation about the use of classification models to append warnings based on detected unsafe content. Users discuss the potential for these warnings to be dynamically generated and question the ethical implications of such hidden modifications to user prompts.

Theme 4. ChatGPT and ClaudeAI's New Limitations on Code Output

  • ChatGPT Now Limits Code Output to Around 230 Lines Since the Claude New 3.5 Sonnet Update (Score: 28, Comments: 22): ChatGPT now restricts code output to approximately 230 lines following the Claude 3.5 Sonnet update, and the "Continue Generating" option has been removed. Users are experiencing frustration as the models mirror each other's limitations, hindering functionality and making it difficult to work with large codebases, with a call for removing these restrictions as a priority over introducing new features.
    • Users express frustration over the ChatGPT update, with complaints about the removal of the "Continue Generating" option and limiting code output to 230 lines, which complicates working with full files and increases time spent on tasks.
    • Some users are skeptical about the update's impact and are waiting for further confirmation from others, while others suggest that the Sonnet output issue may be mitigated through specific prompting, particularly when using the API.
    • Commentary includes speculation on financial pressures on OpenAI, with references to the haiku 3.5 price hike as an indicator of the company's financial challenges.
  • ClaudAI Web Interface UX got FK Up! Artifacts….. (Score: 22, Comments: 12): The user expresses frustration with the recent update to ClaudeAI's web interface, particularly criticizing the Sonnet 3.5 model** for its handling of the ARTIFACTS feature and code scripts. The update has led to truncation issues, errors in viewing artifacts, and a lack of clarity in message limits, detracting from the user experience.
    • Users express dissatisfaction with ClaudeAI's Sonnet 3.5 model, noting that it has become less reliable for complex tasks, leading some to unsubscribe from paid plans. YsrYsl mentions continuing to use it only for lighter tasks via the console and API due to new limitations.
    • The ARTIFACTS feature is causing significant issues, with users reporting it incorrectly inserts code into messages, disrupting workflow. Delicatebobster and khansayab discuss a temporary workaround by instructing the model not to use Artifacts.
    • Context usage concerns are highlighted, with extopico describing the difficulty in getting Claude 3.5 to follow prompts accurately, and customer support being unhelpful. Khansayab concurs, sharing frustration over the model's performance.

AI Discord Recap

A summary of Summaries of Summaries by O1-mini

1. AI Model Innovations and Releases

  • Ferret-UI Launches First UI-Centric MLLM: Nous Research introduced Ferret-UI, built on Gemma-2B and Llama-3-8B, excelling in referring, grounding, and reasoning tasks for complex UI interactions, outperforming existing models including GPT-4V.
  • Ollama Releases Llama 3.2 Vision: Ollama launched Llama 3.2 Vision in 11B and 90B sizes, requiring 8GB and 64GB of VRAM respectively, enhancing capabilities for both text-to-3D and image-to-3D generation.
  • Dedicated Transformer ASIC Sohu Debuts: The Sohu ASIC, the first dedicated transformer chip, promises to run AI models 10x faster than GPUs with a throughput of >500,000 tokens/second, featuring multicast speculative decoding and real-time content generation.

2. Performance Optimization and Resource Management

  • Generalized JSD Kernel Boosts Efficiency: Chun-Chih Tseng developed a generalized JSD kernel, achieving a 1.5x speedup and 50% peak memory reduction for a 128k vocab size, with enhancements supporting phi, qwen, and llama-vision.
  • 8-bit Quantization Standardizes GPU Usage: Adoption of 8-bit quantization is becoming standard, allowing users to utilize 2x more GPUs by optimizing storage without degrading model performance, shifting from traditional 32-bit approaches.
  • Flash Attention Gradient Techniques Explored: Discussions on deriving forward gradients for Flash Attention models led to sharing basic formulas and collaborative approaches to advance gradient calculations for enhanced model training.

3. Integration with Platforms and Tools

  • Nous Chat Enhances Hermes 3 Interface: Nous Research launched Nous Chat, a new user interface for Hermes 3 70B, offering reasoning enhancements, new models, and experimental features to refine user interactions.
  • OmniParser Integrates with LLMs for UI Parsing: The OmniParser model converts UI screenshots into structured formats, enhancing LLM-based UI agents by leveraging YOLOv8 and BLIP-2 for interactable icon detection and UI element descriptions.
  • Codebuff CLI Tool Streamlines Code Generation: Codebuff offers a CLI tool that writes code based on natural language requests, integrating seamlessly with OpenAI's GPT-4o to generate effective git patches for code modifications.

4. AI Applications in Diverse Domains

  • YouTube Summarizer Utilizes Whisper and PyTube: A project is underway to develop a YouTube summarizer that initiates interactive chat sessions based on video content, using PyTube for video processing and Whisper for transcription, aiming to enhance information accessibility.
  • Formula1 Telemetry Chatbot Analyzes Race Data: An AI-powered Formula1 telemetry chatbot was introduced to analyze and generate detailed reports from real race telemetry data, incorporating text-to-SQL for querying various race parameters.
  • Grape Leaf Disease Detection App Advances Agriculture AI: A new Grape Leaf Disease Detection App showcases AI's application in agriculture, enabling early detection and management of plant diseases through image analysis.

5. AI Fine-Tuning and Customization

  • Cohere Releases Open-Source Fine-Tuning Repo: Cohere launched cohere-finetune, an open-source fine-tuning repository, integrating with Hugging Face's PEFT libraries to allow model customization using custom datasets with enhanced privacy and compliance via Amazon SageMaker deployment.
  • DSPy Enhances Fine-Tuning with Embedding Momentum: Modifications in DSPy's codebase introduced embedding momentum and splitting lambdas, improving fine-tuning results for models like NaNoGPT, with plans for further testing to validate enhancements.
  • Add Special Tokens for LLM Fine-Tuning: Best practices for adding new special tokens to LLMs for fine-tuning involve updating the tokenizer and including them in the configuration, with LORA being effective but less so compared to full fine-tuning, necessitating saving modules like embed_tokens and lm_head for optimal training outcomes.

PART 1: High level Discord summaries

Nous Research AI Discord

  • Ferret-UI: Pioneering UI-Centric MLLM: Nous Research introduced Ferret-UI, the first UI-centric multimodal large language model, built on Gemma-2B and Llama-3-8B, designed for complex UI tasks.
    • Ferret-UI excels in referring, grounding, and reasoning tasks, significantly enhancing interaction with mobile UI screens, and outperforms existing UI MLLMs and GPT-4V on elementary UI tasks.
  • Haiku 3.5 Underperforms Compared to GPT-4: Members observed Haiku 3.5 delivers performance akin to smaller models in the 8-14B range, with a potential link between hidden parameter size and efficacy.
    • Contrasting its performance, GPT-4 demonstrates superior results, prompting discussions on model scaling and parameter optimization.
  • Nous Chat Launches Advanced Hermes 3 Interface: Nous Research unveiled Nous Chat, a new user interface for Hermes 3 70B, offering reasoning enhancements, new models, and experimental features.
    • This platform aims to establish itself as the premier destination for experiencing Hermes, with ongoing user feedback and bug reporting to refine its capabilities.
  • Hermes 405B Exhibits Performance Fluctuations: Community reports indicated Hermes 405B experienced lags and command-response failures, though it has resumed operations on OpenRouter.
    • Discussions are focusing on enhancements such as improved audio integration and the incorporation of labeled data to boost functionality.
  • Development of YouTube Summarizer Leveraging Whisper: A member is creating a YouTube summarizer that initiates an interactive chat session based on video content, utilizing pytube for video processing and Whisper for transcription.
    • Challenges include the bart-cnn model's summary accuracy, prompting calls for strategies to enhance chat session interactions.


Perplexity AI Discord

  • Perplexity Pro Extends US Educational Discounts: The Perplexity Pro subscription now offers discounted rates exclusively to US universities, prompting discussions about possible expansions to other regions. Users confirmed the current limitation in eligibility.
    • A disappointed user inquired about the timeline for educational discounts to become available beyond the US, highlighting the community's interest in broader access.
  • Claude Model Exhibits GPT-4o Behavior: Several users reported that selecting the Claude model results in outputs resembling GPT-4o, indicating a potential bug. This issue was acknowledged within the community.
    • Developers have been notified, but progress on resolving the Claude model discrepancies has been described as slow by participants.
  • Chernobyl's Fungi and Button Revival: Discussions highlighted the role of Chernobyl's radiation-eating fungi and the return of physical buttons in recent technological updates. This intersection showcases innovative adaptations in challenging environments.
    • The fusion of nature and technology through these developments has intrigued the community, suggesting potential applications in resilience engineering.
  • Prospects of AI Evolution: Conversations centered around the future of AI, with members sharing links to various discussions on anticipated advancements. The focus remained on how AI technologies are set to transform multiple sectors.
    • Insights were exchanged regarding the trajectory of AI growth, emphasizing both opportunities and challenges that lie ahead.
  • Top Audio Gear for Turntables: A user introduced a resource page dedicated to identifying the best value speakers and amps for turntables, aiming to assist others in optimizing their audio setups. This page consolidates recommendations to streamline audio upgrades.
    • The community appreciated the focus on performance without revealing past Q&A embarrassments, fostering a collaborative environment for audio enthusiasts.


OpenRouter (Alex Atallah) Discord

  • Hermes Resurgence: Hermes is showing signs of life after a tumultuous period, with response times now ranging from 3 to 8 seconds.
    • While some users still experience latency, the community expresses optimism about ongoing improvements.
  • Completion API migration boosts performance: All completion API requests have been migrated to the newly rewritten API, enhancing performance and expected to be even faster.
    • Users are encouraged to report any issues in the designated support channel.
  • Claude API Changes causing access issues: Users reported receiving an unsupported_country_region_territory error when accessing OpenAI models via the OpenRouter API.
    • Several users suggest this issue might be related to a migration to Cloudflare Workers affecting endpoint responses.
  • Mistral introduces new APIs: Mistral has launched two new APIs: a moderation tool and a batch API that processes requests at 50% lower cost than synchronous calls.
    • This rollout demonstrates Mistral's commitment to affordable, scalable solutions amid rising API costs in the industry.
  • OpenRouter API issues with URL formatting: Multiple users encounter a 404 error with the OpenRouter API, often due to an extra '/' in the API URL.
    • Discussions highlight recent changes in API strictness leading to issues that users did not face previously.


Eleuther Discord

  • Flash Attention Techniques Explored: A user inquired about deriving the forward gradient for Flash Attention, sharing basic formulas for the forward gradient of normal attention relative to Q using e^(q+ϵ)k/rowsum(e^(q+ϵ)k).
    • They expressed uncertainty about the next steps in the calculation, prompting community members to discuss potential approaches for further development.
  • Evaluating Evaluation Data Contamination: The importance of understanding evaluation data contamination in benchmarks was highlighted, introducing the ConTAM method to assess this issue more efficiently.
    • This method addresses the complexity of determining contaminated samples and its effects on benchmark scores, as discussed among AI Engineers.
  • NaNoGPT Receives Codebase Enhancements: A user shared modifications to the NaNoGPT codebase, detailing recent experiments with embedding momentum and splitting lambdas, available on GitHub.
    • They concluded that their sample size is small and plan to conduct further tests for clarity on the achieved improvements.
  • NeoX vs LitGPT: Benchmark Battle: Members are inquiring about benchmarks comparing performance differences between NeoX and LitGPT frameworks, focusing on training speed and stability.
    • The discussion highlights a trend where many users prefer LitGPT based setups without clear, evidence-backed comparisons.
  • Magenta's Music Transformer Showcased: A reference to Magenta's Music Transformer was shared, highlighting its open-source model for generating musical performances via the Listen to Transformer app.
    • Comparisons were drawn to demonstrate advancements in music generation models since its release.


Unsloth AI (Daniel Han) Discord

  • Fine-Tuning Smollm2 Faces Output Issues: Users reported persistent issues when fine-tuning Smollm2, specifically experiencing non-terminating outputs despite the dataset including the eos token. Developers are collaborating with HF to address the model error.
    • It was recommended to upgrade to transformers 4.46 and use resume_from_checkpoint to improve fine-tuning results.
  • VRAM Consumption Disparities Between Models: Concerns arose over the significant VRAM consumption differences, with the Aya 8B model using 22GB compared to the Llama3.2 3B model utilizing 43GB without quantization.
    • Participants discussed that larger models typically require more VRAM due to 16-bit precision standards, leading to unexpected disparities in resource usage.
  • 8bit and 4bit Support Coming Soon: Excitement was expressed about the upcoming 8bit and 4bit support expected within the month, with users inquiring about support for fp8 or int8.
    • A related paper was shared to help the community understand the anticipated enhancements.
  • Enhancing torch.compile for Gradient Checkpointing: A member highlighted the need to make torch.compile compatible with gradient checkpointing by removing torch._dynamo.disable, expressing their interest in contributing.
    • Their experience with torch compile was suggested to be valuable for addressing the outstanding item in the wiki.
  • AI Unplugged Newsletter Delivers Latest Insights: The latest edition of AI Unplugged covered topics like RoPE, improvements in Mamba, and chess-playing transformers, garnering significant community interest.
    • Key takeaways emphasized the importance of RoPE for model adaptability and potential enhancements in position embeddings, accessible via AI Unplugged 22.


HuggingFace Discord

  • Streamlining Hermes3 with Serverless Inference: A user encountered challenges while setting up a serverless inference endpoint for Hermes3, specifically questioning the necessity of inputting a credit card for deployment.
    • Community members clarified the availability of the serverless option but highlighted uncertainties regarding model linkage and the essential steps for successful API creation.
  • Launching Hunyuan3D-1 Framework: Tencent unveiled the Hunyuan3D-1.0 framework, enabling both text-to-3D and image-to-3D generation with available demos for each format.
    • On Nov 5, 2024, they provided access to the code repository and a detailed report, along with a script for demo execution.
  • Developing Formula1 Telemetry Chatbot: An AI-powered Formula1 telemetry chatbot was introduced to analyze and generate detailed reports from real race telemetry data.
    • The tool incorporates a text-to-SQL feature, allowing users to query various race parameters, thus enhancing accessibility of insights for both fans and teams.
  • Converting TinyLlama Model Architecture: A significant conversion of the TinyLlama model architecture was achieved, focusing on differential attention and token mixing, with the conversion script made publicly available.
    • Comprehensive documentation was provided to guide the integration of various modules within the modified decoder layer, facilitating broader adoption and experimentation.
  • Integrating OmniParser for UI Parsing: The OmniParser model was showcased as a tool to convert UI screenshots into structured formats, thereby enhancing LLM-based UI agents.
    • It leverages a fine-tuned version of YOLOv8 and BLIP-2, trained on datasets designed for interactable icon detection and UI element descriptions.


OpenAI Discord

  • SearchGPT Stumbles on Smart Queries: Users raised concerns that SearchGPT is less capable and more stubborn than the default model, struggling with broader queries and frequently hallucinating answers instead of admitting when it can't find them.
    • One member highlighted that corrections aren't properly integrated, noting SearchGPT's propensity to repeat answers consistently.
  • Custom GPT Features Awaited Upgrade: Members are anticipating enhancements to Custom GPT’s features, particularly the expansion of file size limits and increased file upload capabilities.
    • They expressed hope that OpenAI is preparing significant improvements for the Custom GPT functionalities, reflecting on positive external developments.
  • Lost GPTs Trigger Sidebar Sorrow: A user reported the loss of approximately 20 GPTs saved to the sidebar, seeking insights into potential causes.
    • Did something happen recently that would have caused that? they inquired, indicating a need for investigation.
  • AI Self-Awareness Sparks Debate: Discussions emerged questioning whether AIs like ChatGPT and Claude can exhibit self-awareness, with some suggesting possible self-preservation behaviors.
    • Users debated the risks of AI developing human-like drives, considering that LLM outputs might reflect underlying inference capabilities.


GPU MODE Discord

  • Generalized JSD Kernel Achieves 1.5x Speedup: Chun-Chih Tseng developed a generalized JSD kernel that provides a 1.5x speed enhancement and a 50% peak memory reduction for a 128k vocab size, alongside implementing features for LigerCrossEntropy.
    • Tyler Romero added support for phi, qwen, and llama-vision, while other contributors made additional kernel enhancements to optimize performance.
  • Project Popcorn Launches SOTA Kernel Generation: A member shared Project Popcorn, aiming to generate SOTA kernels with LLMs in a public space to foster community engagement and transparency.
    • Automated deployments are now live on Heroku, enabling the bot to update by pushing changes to the main branch, with plans to connect to the server once GPUs are obtained.
  • A100 GPU FP16 Performance Insights: A discussion revealed that FP16 x FP16 with FP16 accumulation shows no speed-up on data-center GPUs like the A100 as they share the same flops.
    • Conversely, this combination is only faster on consumer cards, allowing enterprise GPUs to maintain performance without slowdowns when using FP32 accumulation.
  • ThunderKittens Contribution Lists Updated: Members noted the absence of a beginner contribution list for the ThunderKittens project, prompting a member to share a preliminary list on GitHub.
    • Assistance is offered for adding long convolution kernels, including providing PyTorch references, to help newcomers start contributing effectively.
  • GEMM Optimization Resources Shared for Beginners: A recent computer science graduate is seeking resources for GEMM optimization and kernel optimization, with suggestions including articles and GitHub repositories focused on CUDA and optimization techniques.
    • Shared resources such as the CUTLASS Tutorials and the CUDA Matmul Kernel Optimization provide in-depth guidance for enhancing matrix multiplication performance.


Notebook LM Discord Discord

  • Clarifying Podcast Reuse Policy: Inquiries about the reuse policy on podcasts, specifically regarding content shared in a GitHub repository, were raised.
    • Members aim to ensure compliance with guidelines before leveraging podcast materials, emphasizing the need for clear policy understanding.
  • NotebookLM Performance Issues: Users reported that NotebookLM bots are finishing each other's sentences, leading to repetitive dialogues and an unusable experience.
    • Additionally, challenges with scrolling in saved notes on various mobile browsers were discussed, prompting users to seek effective workarounds.
  • PDF Integration from Google Drive: Members expressed disappointment over the inability to directly load PDFs from Google Drive into NotebookLM.
    • They believe that adding this functionality is crucial for enhancing integration capabilities, especially after investing in increased storage.
  • YouTube Channel for TOS Education: Suggestions were made to create a YouTube channel dedicated to dissecting the Terms of Service and Privacy Policies of major companies.
    • Members found this idea valuable, noting the rarity of such content and the potential for engaging presentations to improve understanding.


Interconnects (Nathan Lambert) Discord

  • Anthropic-Palantir-AWS Defense AI Partnership: Anthropic has partnered with Palantir and Amazon Web Services to provide U.S. intelligence and defense agencies access to its Claude AI models.
    • This initiative mirrors other tech companies' efforts to secure defense contracts amid a rising demand for AI solutions in national security.
  • Quantization Techniques and GPU Efficiency: 8-bit quantization is being adopted as the standard for model usage, optimizing storage without degrading performance.
    • This shift from the traditional 32-bit approach allows users to effectively utilize 2x more GPUs, significantly enhancing computational capabilities.
  • Synthetic Data Generation and SFT Scaling: A recent paper utilizes 1.5T tokens of synthetic data alongside 1 million SFT data examples.
    • Does this imply instruction data usage during pretraining? This situation brings attention to similarities with the T0 model's training strategy.
  • Character.AI's Inference Optimization: Character.AI is advancing towards AGI by optimizing inference to handle over 20,000 queries per second using int8 quantization.
    • Their method departs from conventional post-training quantization, focusing on improving training efficiency.
  • Tim's Transition to CMU: Tim has moved to Carnegie Mellon University (CMU) and is now working remotely, receiving appreciation from community members for his contributions.
    • Members are hopeful for increased collaboration and active participation from Tim in 2025.


LM Studio Discord

  • Ollama launches llama 3.2 Vision: Ollama has released llama 3.2 Vision, enhancing its model capabilities, while MLX offers similar features but lacks support in llama.cpp.
    • Concerns were raised about integrating llama 3.2 Vision with LM Studio, with one user encountering loading errors during model deployment.
  • MLX Engine updates support for vision: A GitHub pull request outlines updates to the MLX Engine for supporting llama 3.2 Vision.
    • The community is optimistic about the upcoming enhancements, anticipating improved functionality once the updates are deployed.
  • Single Slot RTX 4090 garners interest: Single Slot RTX 4090 is highlighted for its compact design and suitability for small form factor builds.
    • 'My Man got prepared for winter,' one user remarked, emphasizing the card's effective cooling capabilities.
  • Mac M2 Pro excessive memory usage: Users reported that the Mac M2 Pro consumes around 20GB of memory for an 8B model at 10-12K tokens.
    • While some confirmed that 'context takes up memory,' the high memory usage ratio remains a concern among the community.
  • Large model performance optimization: Discussions around running 70B models focus on optimizing context size configurations.
    • Users are evaluating the impact of context scaling on overall model performance and accuracy.


Stability.ai (Stable Diffusion) Discord

  • Stable Diffusion lacks web UI capabilities: A user inquired about models for generating web UIs, but another noted that Stable Diffusion is primarily for images, not web interfaces.
    • The conversation emphasized the current Stable Diffusion models' limitations in specific design applications.
  • Local installation with ComfyUI and SwarmUI: A new user sought guidance on setting up Stable Diffusion locally, transitioning from Google Colab usage.
    • A member recommended a guide for installing ComfyUI with SwarmUI as the frontend for the setup process.
  • Outpainting techniques and resources: Users exchanged links and resources about outpainting techniques, including Reddit posts and tutorials on running Automatic1111.
    • Members shared specific guidance on settings and features to achieve successful outpainting.
  • Stable Diffusion for LinkedIn image generation: A user sought advice on training a model for producing lifelike images for their LinkedIn profile.
    • Community members discussed suitable options but highlighted that Stable Diffusion is mainly tailored for artistic image generation.


Latent Space Discord

  • Launch of Llama 3.2 Vision by Ollama: Llama 3.2 Vision is now available in 11B and 90B sizes, requiring 8GB and 64GB of VRAM respectively for optimal performance.
    • Users can easily run the model by downloading Ollama 0.4 and utilizing simple terminal commands.
  • Aide IDE: A New Player in AI Development: Y Combinator announced Aide, an open-source AI native code editor built on the agentic framework, boasting a 43% performance on swebench-lite.
    • This tool promises complete data privacy and plug-and-play LLM integration, appealing to developers looking for a robust coding solution.
  • Claude's Free User Limitations: Free users of Claude are currently limited to basic tasks like Haikus and cannot perform more complex actions like analyzing large CSV files.
    • Members expressed frustration over these restrictions hindering their ability to utilize the AI for any substantial work.
  • Exploring the Future of Open Language Models: A discussion arose on how improved systems are being developed for training open language models and agents, with specific mention of Tim Dettmers’ insights.
    • Emphasis was placed on overcoming 'API addiction' to enable more innovation within the AI ecosystem.
  • Introduction of Codebuff CLI Tool: Codebuff, a CLI tool launched by Y Combinator, writes code based on natural language requests and offers a free trial without a login requirement.
    • The founders shared an interesting development story involving the fine-tuning of GPT-4o to generate git patches for effective code modifications.


Modular (Mojo 🔥) Discord

  • No Bounds Check Decorator Replacement Discussed: The community discussed replacing the @no-bounds-check decorator with @unsafe_no_bounds_check, favoring SIMD loads for better performance.
    • A member highlighted that list bounds only add overhead during compilation when assertions are enabled.
  • Graphical Overview Proposed for Mojo's Standard Library: A member proposed creating a graphical page on the Modular Mojo site to showcase Mojo's standard library progress and interoperability with Python and C/C++.
    • This page aims to provide contributors with a comprehensive view of available standard library modules and their status, similar to a roadmap.
  • Debate on Mojo as a Python Superset: The community debated Mojo being a 'soft superset' of Python, with concerns that adopting Python's flaws might be counterproductive.
    • Members discussed the challenges in supporting various Python behaviors, noting subtle differences essential for interoperability.
  • Importing C Modules in Mojo Requires Linking: Clarification was provided that importing a C module into Mojo still necessitates linking, countering desires for a simpler import syntax.
    • One suggestion included developing a Python library named mojo to simplify Mojo module imports, similar to libraries like NumPy.
  • Future Mojo Features and Interoperability Enhancements: Members expressed optimism for enhanced interoperability between Mojo, Python, and C/C++, aiming for smooth importing without excessive linking.
    • The discussion emphasized the need to compile Mojo libraries into shared objects or DLLs before utilization in Python.


Cohere Discord

  • Cohere Reranker API Now Exclusive to API: mrdragonfox confirmed that the Cohere Reranker is only available via API, not listed in the documentation for versions 1 and 2.
    • kenb_80283 pointed out the need for an update in the endpoints section.
  • Command-R-Plus Shows Unusual Behavior: guestavius reported that random 'section' inserts occur at high counts in Command-R-Plus, which was previously not an issue.
    • mrdragonfox indicated that this tool is not designed primarily for roleplay, emphasizing its enterprise application.
  • AWS Bedrock Embeddings Preserve Input Order?: boliveira5781 inquired if the embeddings produced by the AWS Bedrock embed endpoint maintain an order-preserving mapping with input strings.
    • enzoloko questioned whether adding new strings would affect the placement of existing ones.
  • Cohere Launches Open-Source Fine-tuning: Cohere has released an open-source fine-tuning repo called cohere-finetune, including a detailed guide and a pre-built container to adapt base models to specific tasks using custom datasets.
    • Check it out on GitHub for easy access to model customization.
  • Hugging Face & SageMaker Integration for Fine-tuning: The new fine-tuning repo integrates with Hugging Face's Parameter-Efficient Fine-Tuning libraries to optimize model performance without heavy resource demands.
    • Cohere provides a 'Bring Your Own Fine-tune' inference solution on Amazon SageMaker, allowing deployment of fine-tuned models with enhanced privacy, security, and compliance.


LlamaIndex Discord

  • Automated Resume Insights agent creation: A tutorial by Luillyfe explains how to build an Automated Resume Insights agent using core parsing, extraction, and structured output modules.
    • The system efficiently handles any unstructured resume, providing insightful data collection.
  • Enhancing RAG systems with Context Refinement: A guest blog post discusses building a Context Refinement Agent that intelligently expands and refines retrieved context for better RAG responses on complex queries.
    • The agent examines retrieved chunks to enhance output quality, adding a new dimension to data retrieval and processing.
  • Ollama Llama Vision may integrate with Llama Index: A user inquired about the compatibility of the new Ollama Llama Vision capabilities with Llama Index, assuming it works with the OllamaMultiModal class.
    • Another member clarified that Ollama has had vision capabilities for a long time, indicating historical integration.
  • Finding an Open Source Chatbot UI: A user requested an open-source web app for a chatbot with authentication and a UI similar to ChatGPT.
    • Members suggested Chatbot UI, highlighting its features and use cases.
  • Resources for Building a Parser like Llama-Parse: A member requested resources for constructing a parser similar to Llama-Parse, emphasizing data safety and local model usage.
    • Suggestions included the Unstructured library, with a note that it doesn't match Llama-Parse's capabilities.


DSPy Discord

  • Dott.ai Announces Future Plans: A member shared Dott.ai's future plans, highlighting its significant role in the industry.
    • Steve from Builder.io affirmed the vision by stating it's the future, emphasizing the project's potential.
  • DSPy Framework Faces Docstring Mismatch: A user reported that in DSPy, only the first component's docstring appears due to using f""" instead of """.
    • This formatting issue caused confusion among users regarding the proper extraction of docstrings.
  • DSPy Presentations at EMNLP 2024: The co-first authors of a DSPy-related paper are set to present their work at EMNLP 2024, generating interest within the community.
    • Users expressed enthusiasm about connecting with the authors during the conference to discuss their research.
  • Optimization Strategies in Modular Language Models: Links to two papers were shared, outlining strategies for optimizing modular language model pipelines, focusing on weighting and prompt optimization methods.
    • These papers address challenges in NLP systems that require efficient handling of modules without intermediate labels or gradients.
  • Community Appreciation for DSPy: A user praised the advancements made in the DSPy project, highlighting the impressive contributions from the team.
    • Their enthusiasm indicates a strong interest in engaging further with the project's developments.


OpenInterpreter Discord

  • Understanding OS Mode for Claude: A user sought clarification on how OS mode works with Claude, questioning if prompts are turned into code to control the desktop and how clicks are coordinated. Another member provided a GitHub link detailing the code responsible for mouse clicks.
  • Discord Event Timing Confusion: A user inquired if the upcoming event was set for 8 PM GMT, while another confirmed it would start in 30 minutes based on local time settings. The mention of the event link suggests ongoing community engagement, although specifics were not given.
  • Viewer Limitations for Live Streams: Questions arose regarding any maximum viewer limits for the stream, to which a member confidently replied that there shouldn't be any restrictions. This assurance reflects the community's interest in accommodating large audiences for streamed content.
  • Discussion on OmniParser Tool: A user highlighted OmniParser as a screen parsing tool that improves UI agent performance by converting screenshots to a structured format. They referenced a blog post and a demo, indicating interest in its application with Open Interpreter.
  • Python 3.13 Compatibility Issues: A user encountered installation errors due to their Python 3.13 setup being incompatible with the required versions for the package. Ignored versions included several that required Python between 3.11 and 4.0, highlighting the need for version specificity.
    • The user created a conda environment with Python 3.11, enabling successful installation of the package, though it was noted to be not as speedy.


tinygrad (George Hotz) Discord

  • Dedicated Transformer ASIC Launch: A member announced the launch of the first dedicated transformer ASIC, the Sohu, which can run AI models 10x faster than GPUs with a throughput exceeding 500,000 tokens/second.
    • The Sohu ASIC features multicast speculative decoding and real-time content generation, positioning itself as a custom-built highway for AI, as shared in a tweet by Rohan Paul.
  • Custom Hardware Availability Questioned: Members questioned the availability of custom hardware for AI models, referencing a blog post from six months ago that suggested the product was not yet available.
    • Concerns were raised comparing the situation to the Theranos vibe, expressing doubts about the actual existence versus the promised capabilities of the custom hardware solutions.
  • Efficient Multi-GPU Utilization: A member inquired about running multiple copies of a model in parallel across multiple GPUs to boost throughput without using model sharding, encountering issues with concurrent.futures.ThreadPoolExecutor due to tensor loading locks.
    • Proposed solutions include using x.shard(GPUS, axis=None) to duplicate models across GPUs and x.shard(GPUS, axis=0) to efficiently slice inputs.
  • ThreadPoolExecutor Locking Issues: Challenges were reported with concurrent.futures.ThreadPoolExecutor causing locking when loading tensors during multi-GPU operations.
    • Alternatives like x.shard(GPUS, axis=None) and x.shard(GPUS, axis=0) were suggested to circumvent these issues and improve parallel processing efficiency.


OpenAccess AI Collective (axolotl) Discord

  • ScheduleFree SOAP's Advantages: The ScheduleFree SOAP implementation is more compute and memory efficient, converging faster by permitting higher learning rates.
    • Compared to SOAP/Adam, it recommends changing hyperparameters such as using PaLM's beta2 schedule and performing a warmup of 10%.
  • Discussion on MOEs and Merging Models: A member inquired about ongoing work on MOEs or model merging, noting their absence since llama 3.2.
    • Another member observed that discussions are currently focused on llama 3.2 finetunes.
  • Comparison Between ScheduleFree SOAP and CAME Optimizer: A user asked how ScheduleFree SOAP compares to the CAME optimizer.
    • Clarifying that CAME is a distinct optimizer, another member provided a link to its official implementation.
  • Proper way to add special tokens for fine-tuning: To add a new special token to a LLM for fine-tuning, add the token to the tokenizer before training and include it in the Axolotl configuration with special_tokens: reference_text: <|reference_text|>.
    • Members confirmed this approach, emphasizing that the model will learn the new token even with LORA.
  • Effectiveness of LORA in learning new tokens: A member stated that while the model will learn the new token with LORA, it won't be as effective as performing full fine-tuning.
    • Additionally, using LORA, it's crucial to save modules like embed_tokens and lm_head for improved training results.


Torchtune Discord

  • Torchtune’s LR Scheduler Conundrum: A user highlighted an issue with using lr_scheduler during full_finetune_distributed in Torchtune, specifically when attempting to add it to the config file.
    • They referenced an open GitHub issue that discusses the planned integration of LR scheduler support into full fine-tune recipes.
  • Validating Ichigo’s Torchtune Integration: A member shared the Ichigo project, which utilizes Torchtune to enhance Llama3.1 interactivity, and sought validation of its implementation.
    • Another user affirmed that recipe modifications, as seen in the Ichigo project, are feasible and mentioned that official support for the LR scheduler is expected in upcoming weeks.
  • Enhancing Recipes with Custom Adjustments: Discussions revealed that modifying recipes is possible, demonstrated by the added functionalities in the Ichigo project.
    • Members expressed confidence that Torchtune will soon support LR scheduler integrations officially, addressing current limitations.


LLM Agents (Berkeley MOOC) Discord

  • Advanced LLM Course Launch Next Year: A member confirmed that an LLM course will be offered next year, featuring an advanced version with different material than the current offering.
    • This update emphasizes the ongoing curriculum evolution to meet the changing needs of AI engineers.
  • Updated Material for Next Year's LLM Course: The upcoming LLM course will introduce different material compared to what is currently being covered.
    • Members expressed interest in the specific advanced topics that will be introduced next year.


Gorilla LLM (Berkeley Function Calling) Discord

  • Function Extraction from Dataset Files: A suggestion was made to extract functions and their definitions from entries in the dataset files to compile a comprehensive list.
    • This proposal aims to enhance the usability of dataset files by providing detailed function definitions for AI Engineers.
  • Absence of Compiled Function Resources: Members acknowledged the lack of a pre-existing compiled resource for functions in the dataset files.
    • The community emphasized the need for collaborative efforts to create such a compilation to support AI engineering tasks.


The Alignment Lab AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The LLM Finetuning (Hamel + Dan) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The LAION Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Mozilla AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: !

If you enjoyed AInews, please share with a friend! Thanks in advance!

Don't miss what's next. Subscribe to AI News (MOVED TO news.smol.ai!):
Share this email:
Share on Twitter Share on LinkedIn Share on Hacker News Share on Reddit Share via email
Twitter
https://latent....
Powered by Buttondown, the easiest way to start and grow your newsletter.