AI News (MOVED TO news.smol.ai!)

Archives
Subscribe
August 9, 2024

[AINews] Too Cheap To Meter: AI prices cut 50-70% in last 30 days

This is AI News! an MVP of a service that goes thru all AI discords/Twitters/reddits and summarizes what people are talking about, so that you can keep up without the fatigue. Signing up here opts you in to the real thing when we launch it 🔜


Gemini Flash is all you need?

AI News for 8/7/2024-8/8/2024. We checked 7 subreddits, 384 Twitters and 28 Discords (249 channels, and 2423 messages) for you. Estimated reading time saved (at 200wpm): 247 minutes. You can now tag @smol_ai for AINews discussions!

A simple list of all the price cuts in the last 30 days in AI (measured in "mtok" aka "per million tokens" - the bulk of the cost is usually input), by LMsys Elo/Rank:

  • Elo 1286 Rank 2: GPT-4o cut ~50% from May to Aug ($2.50/mtok)
  • Elo 1277 Rank 3: GPT-4o mini effectively cut prices between 70-98.5% depending if you compare with GPT3.5T or GPT4T ($0.15/mtok)
  • Elo 1264 Rank 4: Llama 3.1 405b was initially offered at $5/15 by Together AI - within 48 hours this was cut 46% to $2.7/mtok by DeepInfra with Lepton not far behind ($2.7/mtok)
  • Elo 1249 Rank 8: Mistral Large 2 cut prices vs Feb's Large v1 by 62% ($3/mtok)
  • Elo 1228 Rank 17: Gemini 1.5 Flash cut ~70% - on top of their existing 1 million tokens per minute free tier ($0.075/mtok)
  • Elo 1213 Rank 17: Deepseek v2 beats Gemini to a GA release of context caching, reducing cache hit input token price by a maximum 90% ($0.014/mtok (not a typo)). This is after their original $0.14/mtok pricing which may have set off the price war in the last month

Given Gemini 1.5's extremely generous free tier, every model below Lmsys Rank 17 - currently featuring things like Gemma 2, Nemotron 4, GLM 4, Reka Flash, Llama 3 7b, Qwen 72B and others - are effectively dead on arrival for most individual and team usecases.

The Price-Intelligence frontier advances by another order of magnitude in another quarter.


The Table of Contents and Channel Summaries have been moved to the web version of this email: !


AI Twitter Recap

all recaps done by Claude 3.5 Sonnet, best of 4 runs.

AI Model Developments and Releases

  • New Models and Capabilities: @rohanpaul_ai reported on the release of Llama3.1 405b and Sonnet 3.5, available for free with Google Cloud's $300 credit. @_akhaliq announced EXAONE-3.0, a 7.8B instruction-tuned model from LG AI Research, demonstrating competitive performance against other state-of-the-art open models of similar size. @mervenoyann highlighted MiniCPM V 2.6, a vision-language model combining SigLIP 400M and Qwen2-7B, outperforming proprietary models on various benchmarks.
  • Model Performance and Benchmarks: @sophiamyang noted that Mistral Large is performing well on the ZebraLogic benchmark despite being smaller than other models. @rohanpaul_ai shared that Claude-3.5 remains at the top of LiveBench Benchmarks for the new GPT-4o-2024-08-06.
  • AI Tools and Frameworks: @cHHillee introduced FlexAttention, a new PyTorch API allowing for many attention variants to enjoy fused kernels in a few lines of PyTorch code. This development aims to simplify and optimize various attention mechanisms in neural networks.

AI Research and Insights

  • RLHF and Model Training: @karpathy provided an in-depth analysis of Reinforcement Learning from Human Feedback (RLHF), discussing its limitations and comparing it to traditional Reinforcement Learning. He argued that RLHF is "just barely RL" and highlighted the challenges in applying it to large language models.
  • Compute-Optimal Scaling: @rohanpaul_ai summarized a paper from Google DeepMind on compute-optimal scaling for test-time computation in large language models. The research introduces methods to adaptively allocate test-time compute based on prompt difficulty, potentially allowing smaller base models to outperform much larger ones.
  • Model Merging Techniques: @cwolferesearch explained various model merging techniques, including linear merging, task vectors, TIES merging, and DARE merging. These methods allow for combining capabilities of multiple LLMs without additional training data or compute resources.

AI Applications and Tools

  • SAM 2 for Object Segmentation: @AIatMeta announced SAM 2, a unified model for real-time, promptable object segmentation in images and videos. @swyx highlighted that SAM 1 saved an estimated 35 years of time for users in just one year on images alone.
  • AI Avatars: @synthesiaIO launched personal AI avatars, demonstrating their realism in a live event with 4,000+ attendees.
  • LlamaIndex Developments: @llama_index shared a tutorial on building a documentation chatbot using Firecrawl for web scraping and Qdrant for vector storage and retrieval.

AI Ethics and Policy

  • Structured Outputs and Safety: @AlphaSignalAI reported on OpenAI's release of their most performant GPT-4o assistant model, featuring structured outputs with 100% reliability and improved token limits and pricing.
  • AI Safety Concerns: @rohanpaul_ai summarized a paper on jailbreaking safety-tuned LLMs with human-fluent prompts, achieving high attack success rates while maintaining low perplexity.

AI Reddit Recap

/r/LocalLlama Recap

Theme 1. Free Access to Advanced LLMs: Llama 3.1 405B and Sonnet 3.5

  • Llama3.1 405b + Sonnet 3.5 for free (Score: 304, Comments: 108): Google Cloud is offering free access to Llama 3.1 405B and Sonnet 3.5 models through their Vertex AI Model Garden, providing up to $300 worth of API usage, which translates to approximately 20 million output tokens for Sonnet 3.5 per Google account. A related project, the Open Answer Engine, demonstrates how to create a 405B model with Google search functionality using this API service, as detailed in a Weights & Biases report.
  • Experimenting llama3-s: An early-fusion, audio & text, multimodal model (Score: 92, Comments: 16): Llama3-s, an early-fusion multimodal model integrating audio and text, has been released for experimentation. The model, trained on 1.4 trillion tokens of text and 700 billion tokens of audio, demonstrates capabilities in transcription, translation, and audio understanding tasks, while also maintaining strong performance on text-only benchmarks.

Theme 2. Optimized Inference and Quantization for ARM-based Processors

  • Snapdragon X CPU inference is fast! (Q_4_0_4_8 quantization) (Score: 83, Comments: 39): The Snapdragon X CPU demonstrates impressive inference speeds with Q_4_0_4_8 quantization for Llama 3.1 8B, achieving 15.39 tokens per second on a Surface Pro 11 with a 10-core Snapdragon X Plus chip. The post provides instructions for optimizing performance, including using -win-llvm-arm64.zip releases, setting Windows power mode to Best Performance, and requantizing existing GGUF models to Q4_0_4_8 using the llama-quantize.exe command, noting that these results are comparable to MacBook M2 and M3 performance levels.
  • LG AI releases Exaone-3.0, a 7.8b SOTA model (Score: 144, Comments: 77): LG AI has released Exaone-3.0, a 7.8 billion parameter language model achieving state-of-the-art performance across multiple benchmarks. The model demonstrates superior capabilities in Korean and English languages, outperforming larger models like GPT-3.5 on certain tasks while being significantly smaller in size.

Theme 3. Summarization Techniques and Model Comparison for Large Texts

  • Best summarizing LLMs for average PCs? (Score: 68, Comments: 72): The post discusses summarizing LLMs compatible with consumer-grade hardware, specifically an Nvidia RTX 3060 12GB GPU and 32GB DDR5 RAM. The author recommends Qwen2, InternLM, and sometimes Phi3 mini and medium 128k for summarizing 20-25 thousand word chunks, noting that larger LLMs are incompatible with their setup and that Llama 3.1 underperforms for this task.
    • Llama3.1 and GLM-4-9b are used for summarizing YouTube video transcripts. The process involves creating an outline of chapters, then generating detailed descriptions for each item, which works well for long content using a rolling window approach.
    • The free tier of Gemini 1.5 Flash offers impressive summarization capabilities with a 1 million token context window and 1 million free tokens per minute, as clarified by a user linking to Google AI's pricing page.
    • Obsidian's Copilot plugin allows for easy summarization of selected text using local LLMs, offering a streamlined process for saving summaries directly within the application.

Theme 4. Repurposing Mining Hardware for AI Workloads

  • Picked up a mining rig for testing . . . (Score: 143, Comments: 62): A user acquired a mining rig with 7x 3060 GPUs, discovering it's a complete PC with a weak processor and RAM rather than just PSUs and risers. They're seeking advice on loading an AI model onto this rig and distributing the output to a host LLM application, aiming to repurpose the mining hardware for AI inference tasks.
    • llama.cpp can run LLaMA 3.1 70B Q8 on the rig's 84GB VRAM, with Q6 for more context. Users suggest trying smaller models first, starting with 2B and scaling up to test performance.
    • Upgrading the motherboard and CPU is recommended, with suggestions for dual E5 v3/v4 server CPUs and boards supporting multiple PCIe slots. PCIe Bifurcation splitters can allow one 16x slot to handle multiple GPUs.
    • vLLM is recommended for distributed setup, while ExLlamaV2 offers built-in generator/queue functionality. The rig's single PCIe lane per GPU may be a bottleneck, but once models are loaded into VRAM, CPU and system RAM usage is minimal.

All AI Reddit Recap

r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity

AI Model Improvements and Techniques

  • Flux with LoRA dramatically improves photorealism: In r/StableDiffusion, a post demonstrates how using Flux with LoRA significantly enhances the realism of generated images, particularly for skin textures and facial details. Users noted the first image looked indistinguishable from a real photo.
  • Midjourney to Runway video generation impresses: A post in r/singularity showcases the impressive capabilities of using Midjourney images as input for Runway's video generation, highlighting the rapid progress in AI-generated video.

OpenAI Developments and Speculation

  • Project Strawberry teased: OpenAI's social media posts hinting at "Project Strawberry" sparked discussion and speculation. Some users suggested it could be related to improving ChatGPT's ability to count letters in words like "strawberry", which has been a known issue.
  • Potential new reasoning technology: A Reuters article was shared, indicating OpenAI is working on new reasoning technology under the codename "Strawberry".

AI Model Behavior and Limitations

  • ChatGPT struggles with letter counting: Multiple users tested ChatGPT's ability to count the number of 'r's in "strawberry", with the model consistently answering incorrectly. This highlighted ongoing limitations in certain types of reasoning tasks.
  • Tokenization impact on model performance: Some knowledgeable users pointed out that the letter counting issue is related to how language models tokenize words, explaining why ChatGPT struggles with this seemingly simple task.

Community Reactions and Discussions

  • Skepticism towards OpenAI's marketing: Several users expressed frustration with OpenAI's marketing tactics, viewing the "Strawberry" teasers as overhyped or distracting from other issues.
  • Debate on AI progress: The posts sparked discussions about the current state of AI capabilities, with some users impressed by the rapid progress in image and video generation, while others pointed out persistent limitations in reasoning tasks.

AI Discord Recap

A summary of Summaries of Summaries by GPT4O-Aug (gpt-4o-2024-08-06)

1. Model Performance and Optimization

  • BiRefNet Surpasses RMBG1.4: BiRefNet demonstrates superior performance for background removal compared to RMBG1.4, with enhanced high-resolution image segmentation capabilities as detailed in the arXiv paper.
    • Developed by Nankai University, this model employs bilateral reference techniques that significantly optimize image processing tasks.
  • Torchao v0.4.0 Boosts Optimization: The release of torchao v0.4.0 introduces KV cache quantization and quantization aware training (QAT), enhancing low bit optimizer support.
    • The community discussed a GitHub issue regarding Intx Tensor Subclasses, inviting further input on the tracker to experiment with low bit quantization.
  • RoPE Optimization Simplifies Code: Members analyzed the RoPE implementation, advocating for simplification by shifting to direct trigonometric operations instead of complex numbers.
    • This adjustment was seen as a move towards enhancing code clarity while retaining functional integrity in the training logic.

2. Open Source AI Developments

  • Harambe Revolutionizes Bug Hunting: The introduction of Harambe, an open-source bug hunting tool, aims to streamline API analysis using LLMs to generate API endpoint suggestions.
    • This shift from traditional fuzzing techniques provides a more efficient method for identifying potential issues in code.
  • EurekAI Platform Launches for Researchers: EurekAI is introduced as a cross-collaboration platform for researchers, aiming to streamline the research process with AI features to enhance productivity.
    • Currently in alpha, it promises functionalities such as project creation and integrated journaling designed to foster research engagement.
  • Midjourney CEO Critiques Open Source: Midjourney CEO expressed skepticism towards open source, arguing that local models can't compete with their service using 64 GPUs, and dismissed ControlNet as a lone success.
    • Critics countered that Midjourney's product is akin to inferior versions of what open source can achieve, highlighting overfitting issues in Flux: 'it just has a sort of plastic look to it.'

3. AI Infrastructure and Market Dynamics

  • Hugging Face Expands with XetHub Acquisition: Hugging Face announced the acquisition of XetHub to enhance its collaboration infrastructure for large models, aiming for better dataset management.
    • CEO Clem Delangue highlighted that this move is critical for scaling AI model development and unifying their operational strategies.
  • OpenAI's Price Cuts Ignite Competition: OpenAI is reportedly implementing a 70% price reduction on its GPT-4o model, stirring substantial interest across the industry.
    • This drastic price shift could lead to revised pricing strategies among competitors in the AI model space.
  • Vercel Outage Impacts OpenRouter: Vercel currently faces intermittent outages impacting the OpenRouter service, as detailed in their status update.
    • After several updates, services were stable again by 3:45 PM ET, with ongoing monitoring.

4. Prompt Engineering and Fine-tuning

  • Self-Discover Prompting Gains Attention: A member highlighted the potential of Self-Discover prompting, asserting its power and effectiveness beyond traditional Chain-of-Thought (CoT) approaches.
    • They emphasized its applicability in crafting customized prompts that yield better outputs.
  • RAG Pipeline Needs Enhanced Observability: Concerns surfaced about the RAG pipelines needing better observability to capture query-time traces and the significance of proper document chunking.
    • Improper context chunking could lead to retrieval issues, as emphasized by a tweet.
  • Optimizing Chat History for LLMs: Discussion centered around implementing a custom function to limit chat history for LLM applications, aimed at improving performance.
    • Maintaining user-specific context was identified as a key factor in streamlining chat retention across different user interactions.

5. AI Applications and Tools

  • SAM 2 Pod Launch is Live: The latest episode of the Latent Space podcast features SAM 2, with insights from Nikhila Ravi and Joseph Nelson.
    • Listeners learned that 49 million images were labeled using SAM on RoboFlow, which saved an estimated 35 years of user time.
  • Stable Diffusion Optimizes in Python: Members discussed utilizing the Diffusers library to implement Stable Diffusion in Python, focusing on optimizing performance and VRAM usage.
    • They stressed the importance of setting parameters correctly to attain the desired output quality.
  • MiniCPM-V 2.6 Shines in Performance Tests: MiniCPM-V 2.6 has been reported to outperform its competitors, including Gemini 1.5 Pro, GPT-4V, and Claude 3.5 Sonnet, particularly in multi-image applications.
    • For more details, members shared links to its Hugging Face page and the GitHub repository.

PART 1: High level Discord summaries

Unsloth AI (Daniel Han) Discord

  • 4bit GGUF Models Present Loading Challenges: Discussions arose around the 4bit GGUF models, noting potential precision loss when using load_in_4bit during model loading, as highlighted by the occurrence of OOM errors without this option.
    • While 4bit decreases VRAM consumption, the trade-off in performance needs careful consideration before implementation.
  • Issues Arise with PPO Trainer Implementation: A member reported negative KL divergence errors while attempting to use a customized binary reward function with the PPO Trainer.
    • Exploring DPO as a simpler alternative raised concerns regarding its performance compared to PPO among members.
  • Unsloth Rolls Out Multi-GPU Support: Confirmation of multi-GPU support rollout for trusted Unsloth users could lead to reduced VRAM consumption and increased processing speeds.
    • Debates ensued about whether this feature would be made available in open-source repositories or remain exclusive to paid subscriptions.
  • Successful Quantization of Mistral Models: Insights were shared on quantizing the 123B Mistral-Large-Instruct-2407 model, achieving a size reduction with minimal accuracy drop using the EfficientQAT algorithm.
    • This optimization reinforces the feasibility of improving model efficiency without substantial output degradation.
  • Harambe: The New Bug Hunting Assistant: The introduction of Harambe, an open-source bug hunting tool, aims to streamline API analysis using LLMs to generate API endpoint suggestions.
    • This shift from traditional fuzzing techniques provides a more efficient method for identifying potential issues in code.


HuggingFace Discord

  • BiRefNet surpasses RMBG1.4: BiRefNet demonstrates superior performance for background removal compared to RMBG1.4, with enhanced high-resolution image segmentation capabilities as detailed in the arXiv paper.
    • Developed by Nankai University, this model employs bilateral reference techniques that significantly optimize image processing tasks.
  • Launch of EurekAI Platform: EurekAI is introduced as a cross-collaboration platform for researchers, aiming to streamline the research process with AI features to enhance productivity.
    • Currently in alpha, it promises functionalities such as project creation and integrated journaling designed to foster research engagement.
  • Performance Evaluation of AI Models: Members compared pre-trained translation models like Facebook's M2M100 and SeamlessM4T, which showed promising prospects in multi-language translations.
    • Discussions highlighted differences in transcription capabilities between SeamlessM4T-v2 and Whisper models, with a focus on real-world usability.
  • Exciting Updates in Gradio v4.41: The release of Gradio v4.41 introduces notable features such as full screen images for gr.Image, enhancing output viewing with improved user interaction mechanisms.
    • The update also strengthens security against unauthorized access and XSS attacks, providing a more robust framework for deploying applications.
  • Papers with Code Resource Insights: A member highlighted Papers with Code as an essential resource for summarizing state-of-the-art performance in computer vision, featuring 11,272 benchmarks and 137,097 papers with code.
    • This invaluable platform aids users in exploring various machine learning applications, enhancing literature comprehensibility.


CUDA MODE Discord

  • BPF Insights for CUDA Profiling: A member inquired if anyone was using BPF to profile CUDA, with some stating that eBPF lacks visibility on GPU activities, being limited to the OS kernel.
    • Concerns were raised about its efficacy, with members suggesting alternatives like Nsight Compute and Nsight Systems for comprehensive GPU application monitoring.
  • Attention Gym Links & FlexAttention: Members reported a malfunctioning link for Attention Gym, expressing appreciation for its detailed content on softcapping.
    • Additionally, discussions emerged about integrating FlexAttention into HF models, indicating plans to wait for PyTorch version 2.5 for smoother integration.
  • torchao v0.4.0 is Here: The announcement of torchao v0.4.0 brought enhancements such as KV cache quantization and quantization aware training (QAT), with excitement about its low bit optimizer support.
    • Community engagement involved a GitHub issue regarding Intx Tensor Subclasses for low bit quantization experimentation, inviting further input on the tracker.
  • Memory Usage and KV Cache Optimization: A member's implementation of KV Cache optimized memory usage, enabling full bfloat16 fine-tuning on a single 80GB GPU, albeit at the edge of memory limits.
    • Discussions suggested exploring managed memory to alleviate constraints while preparing pull requests focused on code cleanup and maintainability.
  • RoPE Optimization Discussions: Members analyzed the RoPE implementation, advocating for simplification by shifting to direct trigonometric operations instead of complex numbers.
    • This adjustment was seen as a move towards enhancing code clarity while retaining functional integrity in the training logic.


Perplexity AI Discord

  • Perplexity Pro reduces daily limits: Perplexity Pro users reported a reduction in the daily limit from 600 to 450, raising frustration regarding communication of changes.
    • One member expressed distrust, stating they received no prior notifications about this shift.
  • API outages causing access issues: Users are facing major outages with the Perplexity API, leading to concerns about the scope of the problem.
    • Reports indicate that some users are resolving issues via VPNs to different regions, suggesting potential geo-based discrepancies.
  • Google's antitrust ruling shakes market: On August 5, 2024, a U.S. court ruled against Google for maintaining an illegal monopoly, a significant win for the Department of Justice.
    • The ruling confirmed that 'Google is a monopolist' and outlined unlawful practices that maintain its market dominance.
  • Discussions on quantum theories in neuroscience: Research into quantum entanglement in the brain is sparking debate, particularly around theories like Orch-OR, which suggest cognitive influence.
    • Skeptics argue that the brain's warm, wet conditions may not support sustained quantum states.
  • Non-English responses lacking coherence: Users noted that prompts in non-English languages often produce incoherent responses, highlighting limitations in multilingual processing.
    • One instance in French led to repetitive outputs, raising concerns about the model's robustness across diverse languages.


Stability.ai (Stable Diffusion) Discord

  • Optimize Stable Diffusion in Python Projects: Members discussed utilizing the Diffusers library to implement Stable Diffusion in Python, focusing on optimizing performance and VRAM usage.
    • They stressed the importance of setting parameters correctly to attain the desired output quality.
  • Upgrade Your Old PCs for AI Work: A user sought advice on upgrading their outdated PC setup for AI tasks, looking for affordable components that wouldn't require a complete overhaul.
    • Suggestions included using Fiverr for assembly assistance and considering barebones prebuilt PCs as alternatives.
  • Face Swapping on Intel CPUs: A user requested recommendations for face swapping techniques compatible with Intel CPUs, expressing a willingness to pay for expert help.
    • This highlighted the demand for practical solutions targeting users with less powerful hardware configurations.
  • Enhancing Images with SAM Workflow: The community shared insights on utilizing the SAM detector to improve image detailing, enabling enhanced workflows.
    • One member emphasized detailing beyond just people, including backgrounds and structures, broadening the potential use cases.
  • NSFW Generation on Mac - Web Tools Needed: A user asked for the best web-based tools for NSFW content generation that would work efficiently on a MacBook Air M2 with 16GB RAM.
    • The discussion included performance implications tied to model complexity and the benefits of local installation based on hardware capabilities.


LM Studio Discord

  • NVIDIA Cards untouched by current issues: Current performance issues impact CPUs only, garnering relief from members regarding their NVIDIA Cards.
    • Discussions highlighted preferences for CPU vs. GPU setups, showcasing advantages of CPU-driven workloads.
  • CPU usage reports create confusion: A conversation emerged about CPU usage numbers exceeding 100%, explained by applications reporting total usage based on core counts.
    • Members pointed out varied reporting standards among operating systems, leading to prevalent misunderstandings.
  • Dual GPUs not speeding up inference: Members confirmed that LM Studio supports dual GPUs, but inference speed remains akin to a single GPU configuration.
    • Recommendations surfaced for hardware improvements to enhance token throughput for better performance.
  • Performance debate: 4090 vs. 3080: User dissatisfaction was voiced over the 4090 performing similarly to the 3080, with only a 20 ms per epoch training speed advantage.
    • While the 4090 excels in gaming, others highlighted the 3080's efficiency in handling models under 8B.
  • Limited VRAM hampers model choices: 2GB VRAM proves insufficient for most models, resulting in poor performance with low VRAM options.
    • Users noted the necessity of splitting larger models across VRAM and system RAM, which significantly constrains efficiency.


OpenAI Discord

  • OpenAI Releases GPT-4o System Card: OpenAI shared the GPT-4o System Card, detailing assessments aimed at tracking frontier model risks and outlining audio capabilities with preset voices.
    • This card ensures proper guardrails against harmful content, enhancing user trust and understanding.
  • Free Users Access DALL·E 3: ChatGPT Free users can now create up to two images per day using DALL·E 3, making content generation more accessible.
    • This feature enables personalized creative outputs for projects such as presentations and custom cards through seamless requests.
  • Ongoing Website Access Problems: Multiple users reported connectivity issues accessing OpenAI's main site, resulting in persistent errors and intermittent accessibility.
    • This situation confirms growing frustration among members and unexpected difficulties across the community.
  • Confusion over Message Quotas: Members expressed frustration regarding early message quota limits when using the platform, particularly in relation to the GPT-4o.
    • This experience led to discussions on the inconsistency of hitting limits unexpectedly, affecting user interaction.
  • Struggles with OpenAI Python SDK: Users faced challenges replicating results using the OpenAI Python SDK, especially when encountering discrepancies in Python versions.
    • This indicated potential compatibility issues that hinder accurate output across varying coding environments.


Nous Research AI Discord

  • MindSearch AI Enhances Information Retrieval: The paper MindSearch: Mimicking Human Minds Elicits Deep AI Search presents an Agentic AI that improves information retrieval through a dual-system approach with WebPlanner and WebSearcher, outpacing current search models.
    • This innovative structure effectively handles complex queries, demonstrating significant enhancements in intelligent information seeking.
  • Tavus Phoenix Model Takes Video Generation by Storm: Tavus launched the Phoenix model that creates hyper-realistic talking head videos with the ability to synchronize natural face movements using advanced techniques.
    • Developers can access the Phoenix model through Tavus' Replica API, enabling diverse and high-level customizations for video content.
  • Models Crash on Upside Down Text: Various models like Mistral and ChatGPT fail to generate coherent upside down text, while Claude Opus and Sonnet 3.5 handle it effortlessly with accurate outputs.
    • These observations highlight Claude models' superior capabilities, particularly in generating and rewriting upside down texts without errors.
  • Community Discusses AI Discord Resources: A member shared a Reddit post listing several useful AI Discord channels, including Replete-AI and Unsloth.
    • These resources provide varied insights and support for those navigating the AI landscape within Discord.
  • Claude API Faces Server Overload Issues: Users pointed out that the Claude API frequently gives overload messages during peak usage times, which disrupts their workflow.
    • Uncertainty remains on whether these issues stem from server limitations or bans affecting access.


Eleuther Discord

  • LM Harness Dataset Requirements Clarified: A member inquired about the required format for datasets intended for LM Harness, questioning the necessary dictionary keys. They were directed to YAML files for structured guidance on key design.
    • This emphasizes the flexibility in formatting, which is crucial for developers working on dataset integration.
  • Debating CBRN Risks of AI Models: Members discussed whether models can advise on chemistry without CBRN risks, highlighting concerns that filtering might jeopardize scientific capabilities.
    • The discussion pointed out that knowledgeable users might still extract harmful info, challenging the effectiveness of current filtration strategies.
  • Consequences of Filtering Pretraining Data: Participants argued that erasing 'bad' data could diminish the model's overall comprehension and alignment effectiveness.
    • It was mentioned that lacking negative examples might impede the model's capacity to avoid harmful activities, raising concerns over competency regression.
  • Frustrations with AI Journalism: Members shared their dissatisfaction with how journalists represent AI, often emphasizing sensational risks without adequate context.
    • This creates broader concerns about the safety narratives around AI outputs and their potential misrepresentation.
  • Searching for Open Source Reward Models: A query came up regarding effective open source Process Based Reward Models for verifying mathematical tasks.
    • This underlines a pressing need for reliable verification tools within the domain of math problem-solving.


Interconnects (Nathan Lambert) Discord

  • Hugging Face Expands with XetHub Acquisition: Hugging Face announced the acquisition of XetHub to enhance its collaboration infrastructure for large models, aiming for better dataset management.
    • CEO Clem Delangue highlighted that this move is critical for scaling AI model development and unifying their operational strategies.
  • Qwen2-Math Dominates Math Tasks: The newly launched Qwen2-Math model series by Alibaba outperforms both GPT-4o and Claude 3.5 in specialized math tasks.
    • This marks a significant leap for math-specific language models, indicating potential shifts in domain-specific applications.
  • AI Infrastructure Unicorns on the Rise: A discussion series reveals how AI infrastructure builders like Hugging Face and Databricks are shaping generative AI markets.
    • Hugging Face's recent financing efforts position it to rival GitHub in the open-source domain, reflecting a robust growth strategy.
  • OpenAI's Price Cuts Ignite Competition: OpenAI is reportedly implementing a 70% price reduction on its GPT-4o model, stirring substantial interest across the industry.
    • This drastic price shift could lead to revised pricing strategies among competitors in the AI model space.
  • Clarifications on Token Count for GPT-4: Reports establish that GPT-4 utilizes 10 trillion tokens, a figure corroborated by multiple sources in the chat.
    • Despite this consensus, members labeled GPT-4 as ancient technology, suggesting the fast-paced evolution of model capabilities.


LangChain AI Discord

  • Fixing LangChain in AWS Lambda: A user faced pydantic errors when trying to import LangChain modules in AWS Lambda with Python 3.12 runtime, highlighting potential version conflicts.
    • Suggestions included double-checking the lambda layer setup to resolve the import issues.
  • Optimizing Chat History for LLMs: Discussion centered around implementing a custom function to limit chat history for LLM applications, aimed at improving performance.
    • Maintaining user-specific context was identified as a key factor in streamlining chat retention across different user interactions.
  • LangChain vs. Other Frameworks Debate: Users expressed frustration that switching from OpenAI to Anthropic with LangChain required substantial code rewrites due to functional differences.
    • Participants agreed that despite LangChain's abstraction, specific adjustments remain necessary based on the behavior of individual LLMs.
  • LLM Reliability Concerns: Concerns were raised about Claude 3.5 experiencing internal server errors, stressing the reliability of AI systems in production.
    • This led to broader discussions on whether LangChain is the right choice for stable AI system implementations.


Latent Space Discord

  • GPT-4o enhances input and output capabilities: The GPT-4o model can process text, audio, image, and video, significantly boosting versatility and response speed, akin to human interaction.
    • It's also 50% cheaper in API usage and shows improved performance across multiple languages.
  • Gemini 1.5 Flash slashes pricing: GoogleAI cut the pricing for Gemini 1.5 Flash by about 70%, making it far more accessible for developers.
    • The AI Studio is now available for all workspace customers, facilitating better experimentation with new languages.
  • DALL·E 3 opens up for Free users: ChatGPT Free users can now generate two images per day with DALL·E 3, improving content creation accessibility.
    • While this feature is welcomed, some skepticism still exists regarding its broader applications.
  • Mistral Agents broaden functional integration: Mistral Agents can now utilize Python in various workflows, highlighting their greater adaptability.
    • Users are keen on features that facilitate API consumption, enhancing real-world applications.
  • SAM 2 Pod Launch is live: The latest episode of the Latent Space podcast features SAM 2, with insights from Nikhila Ravi and Joseph Nelson.
    • Listeners learned that 49 million images were labeled using SAM on RoboFlow, which saved an estimated 35 years of user time.


LAION Discord

  • Midjourney CEO critiques open source: Midjourney CEO expressed skepticism towards open source, arguing that local models can't compete with their service using 64 GPUs, and dismissed ControlNet as a lone success.
    • Critics countered that Midjourney's product is akin to inferior versions of what open source can achieve, highlighting overfitting issues in Flux: 'it just has a sort of plastic look to it.'
  • ASL language model concept emerges: A user proposed developing an app to translate speech to ASL, considering the challenge of training a model with images of signs.
    • Suggestions included fine-tuning existing models, and another user discussed refining voice recognition models to use emojis for representing hand gestures.
  • Synthetic voice dataset idea proposed: A member proposed using so-vits-svc to create synthetic datasets by transforming voices in audio files, aiming to enhance variety while retaining content.
    • This approach could facilitate capturing a wider range of emotions in voice representation and improve model differentiation in demographic classifications.
  • Flux model discussions continue: Users reflected on Flux, with some labeling it 'a fun toy' that hasn’t made significant advances, raising concerns about its overfitting.
    • The ongoing dialogue emphasized the need for more intentional fine-tuning comparing Flux to Midjourney.
  • Multiple AI applications for accessibility: Various suggestions for AI aimed at enhancing accessibility were shared, including a privacy-respecting IP Relay app for speech recognition.
    • Members focused on local inference techniques to help those with hearing impairments, showcasing a robust interest in impactful AI applications.


OpenAccess AI Collective (axolotl) Discord

  • Multi-backend Refactor Installed Smoothly: One member confirmed they successfully installed the multi-backend-refactor without any issues and is ready to monitor future developments.
    • This smooth installation process boosts confidence in its stability and utility in ongoing projects.
  • Google Gemini Slashes Prices: A member shared a YouTube video titled 'Google Gemini Insane Price Cuts!!!', featuring reductions on Gemini 1.5 Flash.
    • The video outlines substantial markdowns, and viewers can find additional details in the Google Developers blog.
  • Call for H100s in the Metaverse: A humorous remark was made suggesting that Zuck needs to deliver more H100 GPUs in the metaverse, highlighting demand for advanced resources.
    • This statement underscores the ongoing need for high-performance computing in virtual environments.
  • Training with 38k Dataset: One member reported training their model with a 38k item dataset, taking 32 hours on an RTX 4090.
    • They raised concerns that the learning rate in their current setup might be too high.
  • Correct Prompt Formatting Discussion: Members stressed the necessity of the Alpaca format for task-specific prompts during inference to ensure consistency.
    • They emphasized that output during chatting must mirror the format utilized in fine-tuning for optimal results.


Cohere Discord

  • Almosnow seeks API file upload guidance: A member wanted to replicate the PDF querying functionality from the UI on coral.cohere.com using the API but struggled to find the relevant documentation.
    • An error occurred: could not convert string to float: 'float_' indicates an underlying issue with input formatting.
  • Mapler provides RAG resources: Mapler responded with resources on using Retrieval-Augmented Generation via the Cohere API, linking to a blog post and additional documentation.
    • They shared a code snippet for producing grounded answers, enhancing understanding of RAG use.
  • Azure AI Search integration woes: Users reported inconsistent results with Cohere embeddings in Azure AI Search, despite vectorized data successfully being indexed.
    • Integrated vectorization with models from Azure AI Studio was highlighted as a potential resource for addressing issues.
  • Cohere-toolkit enhancements for tool activation: A discussion emerged about enabling a tool by default in Cohere-toolkit by adding always use the <tool> tool to the preamble.
    • It was noted that the tool must be listed for it to function correctly during calls.
  • User experiences custom deployment hurdles: A member shared attempts to modify invoke_chat_stream for default tool loading in their custom deployment with limited model selection.
    • Confusion arose due to UI discrepancies showing tools not activated, emphasizing a need for clarification in model feedback.


LlamaIndex Discord

  • LlamaIndex Announcements on the Horizon: An announcement for LlamaIndex is set to happen in 5 minutes, which generated buzz among members in the announcements channel.
    • Members are eagerly awaiting highlights or updates that might come from this event.
  • RAG Pipeline Needs Enhanced Observability: Concerns surfaced about the RAG pipelines needing better observability to capture query-time traces and the significance of proper document chunking.
    • Improper context chunking could lead to retrieval issues, as emphasized by a tweet.
  • LongRAG Paper Comparison Ignites Discussion: The shared LongRAG paper indicates that long-context models outperform RAG when adequately resourced, prompting discussions on its methodologies.
    • Members expressed a desire for comparisons involving Claude 3.5 and insights from Lance of LangChain, enhancing community discourse.
  • Self-Routing Technique Revolutionizes Efficiency: The Self-Route method introduced in the LongRAG paper routes queries based on self-reflection, cutting costs while preserving performance.
    • Proposals for parent-document retrieval leveraging metadata surfaced to boost retrieval systems, highlighting reliability challenges in metadata labeling.
  • Workflows Abstraction Stirs Excitement: The team demonstrated the ease of building complex AI applications with Workflows, particularly in rebuilding LlamaIndex's Sub-Question Query Engine showcased in a new video.
    • This positions Workflows effectively for deploying intricate query engines in generative AI applications.


Torchtune Discord

  • Concerns on LLAMA 3 Generation Quality: Using the LLAMA 3 8B instruct model, a member found that prompting with 'anything' led to unexpected outputs, raising concerns about generation quality.
    • They directed others to share experiences or refer to GitHub issue #1285 for further discussion.
  • Evaluating RTX A4000 and A2000 for Fine Tuning: The discussion highlighted performance characteristics of RTX A4000 and RTX A2000, each equipped with 16GB of memory, revealing underwhelming fine-tuning results with 1.5B models.
    • One member suggested increasing the default batch size to better manage memory costs, possibly fitting workloads into 12GB.
  • Memory Optimization Parameters Under Review: There’s ongoing guesswork on memory optimization parameters, with mentions of LoRA not currently being prioritized despite its effectiveness.
    • The potential for optimization is evident, especially for members using GPUs with 8GB VRAM, who could experience improvements of over 2x.
  • Discussion on RLHF Cleanup: A member raised questions about necessary cleanups for RLHF prior to public sharing, recalling earlier notes on required adjustments.
    • They expressed willingness to collaborate on creating a tutorial or blog post, acknowledging the effort involved.
  • Plans to Publicize and Document Work: Eager to initiate discussions on publicizing their work and developing documentation or tutorials, a member outlined a loose roadmap.
    • They welcomed community input and assistance to enhance these efforts, indicating a collective approach.


Modular (Mojo 🔥) Discord

  • Freedom to Build Up on AI Infrastructure: Members discussed that it's acceptable to deploy anything on AI Infrastructure as long as there's no intent to commercialize, referencing the pricing page.
    • Internal tools usage appears fine as long as the aim isn't commercialization, but guidelines remain somewhat unclear.
  • VS Code + WSL: A Dynamic Duo for Mojo: One user explored running Mojo in a Windows dev environment using Mojo Max on WSL, recommending VS Code to bridge Windows and Linux seamlessly.
    • You pretty much forget you're developing in Linux when leveraging this setup, though some limitations exist in reproducibility.
  • FancyZones Boosts Workflow Management: A member introduced the FancyZones utility, enhancing window management on Windows by snapping applications into defined zones for better productivity.
    • This tool allows for efficient screen use, helping developers streamline their workflow in a multi-window setup.
  • Active Directory: Not Quite a Distributed Database: A humorous debate unfolded over calling Active Directory a distributed database, with members noting it lacks characteristics like true consistency despite being labeled as such.
    • Further discussion emerged about existing distributed databases on Windows, showcasing an interest in clarifying terminology within the community.


DSPy Discord

  • Inspect Tool Teased for LLM Evaluation: A member queried about Inspect for LLM observability, looking for integration insights with DSPy.
    • While no experiences were shared, the tool seems positioned to enhance large language model evaluations.
  • DSPy Gains Advantage Over Langgraph: A distinction emerged with DSPy optimizing prompt space instructions, while LangGraph acts at a lower level within LangChain architecture.
    • Essentially, DSPy is about performance boosts, whereas LangGraph handles system-level interfacing.
  • Optimize_signature Triumphs Over COPRO: Users reported that optimize_signature outperformed COPRO in Chain of Thought tasks on GSM8K, achieving a score of 20/20.
    • In contrast, COPRO struggled to secure a zero-shot instruction solution, maxing out at 18/20.
  • User Seeks Help with DSPy-Multi-Document-Agent: A member faced challenges locating the requirements.txt for DSPy-Multi-Document-Agent, questioning if they missed crucial files.
    • This inquiry pointed to potential documentation gaps or unclear resource links.
  • Interest in Advanced Retrieval with qdrant_dspy: A link to the qdrant_dspy GitHub repository highlights building RAG pipelines using Gemma-2b, DSPy, and Qdrant.
    • Another resource, dspy/retrieve/qdrant_rm.py, emphasizes DSPy's utility in local VectorDB programming.


tinygrad (George Hotz) Discord

  • ValueError Strikes getenv Function: A user faced a ValueError, specifically 'invalid literal for int() with base 10: WARN', while importing, which pointed to an environment variable issue.
    • A member suggested that checking environment variables would help, confirming that the DEBUG variable set to 'WARN' was the source of the problem.
  • DEBUG Variable Causes Trouble: The DEBUG environment variable being set to 'WARN' led to issues with the getenv function in a notebook environment, despite the user's Python script functioning well.
    • This highlights potential compatibility differences between notebook and standalone script environments in tinygrad.
  • Tinygrad Tensor Puzzles Challenge Launched: Members introduced Tinygrad Tensor Puzzles, a collection of 21 engaging puzzles** aimed at mastering tensor libraries like tinygrad from first principles, avoiding magic functions.
    • This initiative, building on Sasha's PyTorch Tensor-Puzzles, encourages contribution from both newcomers and experienced developers, fostering a community of problem solvers.
  • Tutorials to Explore Tinygrad Internals: A set of tutorials was shared, designed to enhance understanding of tinygrad's internals and promote contributions, along with a quickstart guide for foundational insights.
    • While not entirely beginner-friendly, these resources provide essential knowledge for developers looking to engage with tinygrad effectively.
  • Optimizing Tinygrad with Computer Algebra Techniques: Recent discussions included computer algebra study notes relevant to optimization processes in tinygrad, enhancing potential performance insights.
    • This integration showcases valuable methodologies that could support developers in refining tinygrad's capabilities.


OpenInterpreter Discord

  • Seeking Open Source Vision Models: Members are actively looking for recommendations on open source models suited for vision tasks, inquiring about both local and API options for implementation.
    • One member showed curiosity by asking for insights on the availability and performance of such models within the community.
  • MiniCPM-V 2.6 Shines in Performance Tests: MiniCPM-V 2.6 has been reported to outperform its competitors, including Gemini 1.5 Pro, GPT-4V, and Claude 3.5 Sonnet, particularly in multi-image applications.
    • For more details, members shared links to its Hugging Face page and the GitHub repository.
  • Inquiry on Shipping Updates: A member raised the question of shipping updates, indicating interest in the timeline and status.
    • Although no specific answers were provided, a link to a relevant Discord channel was shared for potential discussions.


MLOps @Chipro Discord

  • Llama Team engages with queries on arXiv: The Llama team is responding to questions on the arXiv discussion forum, providing an opportunity for direct technical engagement.
    • This initiative allows for deeper insights into the Llama 3 models and their applications.
  • Quora launches Poe Hackathon: Quora is hosting an in-person and virtual hackathon focused on building bots with the new Previews feature for Poe.
    • Participants will develop innovative in-chat generative UI experiences utilizing advanced LLMs like GPT-4o and Llama 3.1 405B.
  • Exploring Non-Generative AI Applications: A member sparked a conversation about the significance of non-generative AI, encouraging others to share their thoughts.
    • What kinds of AI applications do you have in mind? stirred interest in exploring various applications.
  • Diverse AI Applications Identified: Suggestions flowed in for computer vision, forecasting, recommendation systems, and NLP as key non-generative AI areas.
    • These examples illustrate the broad spectrum of AI technologies that serve various niches beyond generative models.


OpenRouter (Alex Atallah) Discord

  • Vercel's Outage Affects OpenRouter: Vercel currently faces intermittent outages impacting the OpenRouter service, as detailed in their status update. After several updates, services were stable again by 3:45 PM ET.
    • Vercel continues to monitor the issue and ensures updates will be posted on the Vercel status page.
  • Anthropic's High Error Rates Mitigated: Anthropic has been addressing elevated error rates affecting the 3.5 Sonnet and 3 Opus models, implementing mitigation strategies that restored normal success rates as of Aug 8, 17:29 PDT.
    • They have provided updates ensuring access for Claude.ai free users is now restored while closely monitoring the situation.


The Alignment Lab AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The LLM Finetuning (Hamel + Dan) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Mozilla AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The DiscoResearch Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: !

If you enjoyed AInews, please share with a friend! Thanks in advance!

Don't miss what's next. Subscribe to AI News (MOVED TO news.smol.ai!):
Share this email:
Share on Twitter Share on LinkedIn Share on Hacker News Share on Reddit Share via email
Twitter
https://latent....
Powered by Buttondown, the easiest way to start and grow your newsletter.