AI News (MOVED TO news.smol.ai!)

Archives
June 4, 2024

[AINews] Not much happened today

This is AI News! an MVP of a service that goes thru all AI discords/Twitters/reddits and summarizes what people are talking about, so that you can keep up without the fatigue. Signing up here opts you in to the real thing when we launch it 🔜


AI News for 6/3/2024-6/4/2024. We checked 7 subreddits, 384 Twitters and 29 Discords (400 channels, and 4568 messages) for you. Estimated reading time saved (at 200wpm): 455 minutes.

Twelve Labs raised $50m, Livekit raised $22m, Groq is now running 800tok/s, and there's an OpenAI resignation thread from Daniel Kokotajlo.

But no technical developments caught our eye.


The Table of Contents and Channel Summaries have been moved to the web version of this email: !


AI Twitter Recap

all recaps done by Claude 3 Opus, best of 4 runs. We are working on clustering and flow engineering with Haiku.

AI and Large Language Model Developments

  • Gemini model performance: @arohan highlighted the Gemini 1.5 FlashModel as an outlier providing high performance at low cost, making useful models accessible to more users. He also noted Gemini Pro taking the #2 spot in Japanese language performance.
  • Optimizing Mixtral models with TensorRT: @rohanpaul_ai shared how Mixtral models can be made to run up to 8x faster on NVIDIA RTX GPUs using TensorRT-LLM, which compiles the model and optimizes kernels for efficient serving, supporting expert and tensor parallelism.
  • Mamba-2 model architecture: @tri_dao and @_albertgu introduced Mamba-2, which uses state space duality to enable sequence models with 8x larger states, 50% faster training, and connections between SSMs and linear attention, outperforming Mamba-1 and strong Transformer architectures.
  • Phi-3 model benchmarks: @_philschmid reported that Phi-3 Medium (14B) and Small (7B) models are on the @lmsysorg leaderboard, with Medium close to GPT-3.5-Turbo-0613 but behind Llama 3 8B, and Small near Llama-2-70B and Mistral fine-tunes, suggesting optimizing only for academic benchmarks is not enough.

Prompt Engineering and Data Curation

  • Power of prompt engineering: @rohanpaul_ai emphasized the power of prompting LLMs correctly to enable capabilities like jailbreaking, adhering to JSON schemas, grounding and more by navigating the latent space.
  • Importance of data quality: @sarahcat21 pointed out that models perform better when trained on good data, making data curation critical. @HamelHusain promoted an upcoming masterclass on organizing and generating high quality data for fine-tuning.

AI Safety and Alignment Discussions

  • Frontier AI lab employee letter on safety disclosures: @jachiam0 shared thoughts on a circulating letter from former and current frontier AI lab staff advocating for whistleblower protections on safety and risk issues, arguing it could disrupt trust and make sensitive internal discussions harder.
  • Aligning AI to user intent vs humanity's interests: @willdepue argued alignment should focus on the easier problem of aligning AI to the user's intent rather than the intent of creators or benefit of all humanity. However, @jachiam0 and @Teknium1 countered that AI could become autonomous and not serve user interests, necessitating global alignment.

AI Reddit Recap

Across r/LocalLlama, r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity. Comment crawling works now but has lots to improve!

TO BE COMPLETED


AI Discord Recap

A summary of Summaries of Summaries

  1. Finetuning and Optimization for LLMs:

    • Optimizing LLM Accuracy by OpenAI provides advanced techniques like prompt engineering, RAG, and guidelines on acceptable performance levels. Check out the accompanying YouTube talk for deeper learning.
    • Discussing Multimodal Finetuning, users explored Opus 4o and MiniCPM-Llama3-V-2_5 for image text parsing and OCR, and considered retrieval methods for structured datasets (Countryside Stewardship grant finder).
    • Queries about continuous pretraining and memory efficiency highlight Unsloth AI's ability to halve VRAM usage compared to standard methods, detailed in their blog and GitHub page.
  2. Model Performance and Inference Efficiency:

    • Modal impressed with 50x revenue growth and revenue exceeding eight figures while also optimizing infrastructure. Insights were shared in Erik's talk at Data Council and Modal's hiring link.
    • Discussions about bitshift operation across all backends (tinygrad) and performance adjustments (PR #4728) versus traditional operations stirred debates on improvement margins.
    • Users tackled CUDA recompile issues by realigning flags for effective compilation. They exchanged resources like the RISC-V Vector Processing YouTube video for further learning.
  3. Open-Source Developments and Community Projects:

    • LlamaIndex's integration with Google Gemini demonstrated a million-token context window facilitating complex queries, while practical problems were solved via custom solutions detailed in their documentation.
    • Modlab’s Deep Dive into Ownership in Mojo showcased detailed work by CEO Chris Lattner exploring developer-friendly innovations. Community feedback on making all functions async sparked diverse opinions on compatibility and ease of transition.
    • Projects like FineWeb from Hugging Face and the Phi-3 models climbing the @lmsysorg leaderboard highlight progress and ongoing research in open-source AI.
  4. System and Hardware Troubleshooting:

    • Members resolved several technical issues, such as infinite loops on Macbook M1 with ollama llama3 setup by troubleshooting system commands, and Async processing in LM Studio facilitated with practical discussions on GPU usage efficiency.
    • They discussed performance discrepancies in GPUs (e.g., 6800XT achieving only 30it/s) and potential improvements with proper setup and driver considerations, showcasing a blend of peer support and technical expertise.
    • Open-source solutions like IC-Light, focusing on improving image relighting, and CV-VAE for video models (ArXiv link) were enthusiastically shared among hardware and software enthusiasts.
  5. Health of AI Communities and Conferences:

    • Several platforms confirmed credit distributions to users, dealing with issues such as double credits, while fostering a supportive environment seen in community exchanges and career stories.
    • Events like Qwak's Infer: Summer '24 invite AI/ML enthusiasts for practical sessions with industry experts, further detailed in conference registration.
    • AI News newsletters faced formatting issues in ProtonMail dark mode, encouraging community-led problem-solving, and events like Torchtune seeking recognition highlighted active engagements and the importance of visibility in community contributions.

PART 1: High level Discord summaries

LLM Finetuning (Hamel + Dan) Discord

  • Optimization Adeptness for LLM Smarts: OpenAI shared an advanced guide on optimizing LLMs for better accuracy, which includes techniques like prompt engineering, RAG, and fine-tuning, along with deciding acceptable performance levels for real-world applications. The guide is accessible at Optimizing LLM Accuracy, and for deeper learning, check out the YouTube talk "A Survey of Techniques for Maximizing LLM Performance" here.
  • Credits Galore Across the LLM Landscape: Across the guild, several platforms including Hugging Face, Replicate, Modal, Predibase, and Braintrust are confirming the distribution of credits to users. Issues like double credits and missing credits are being addressed, with indications that users should contact support or check their billing settings for confirmation. Users are also advised to follow up on pending platforms for credit distribution.
  • All Aboard Fine-Tuning Innovations: Lively discussions revolve around fine-tuning LLMs, exploring multimodal finetuning, leveraging Opus 4o and MiniCPM-Llama3-V-2_5 for parsing text from images, and using PaliGemma for OCR tasks. Tear into model merging tactics with Axolotl, here, and pick apart Medusa and LOra's ability to enhance LLM inference, here and here. Users suggested retrieval applications could be fruitful for structured government data, such as dataset details found at Countryside Stewardship grant finder.
  • Community Exchange and Shared Journeys: From discussions on CUDA book recommendations to tales of transitioning from academia to freelancing, or novices' learning journeys to AI, there's a buzz of community empowerment and peer guidance. Shared experiences underscore the importance of learning and adapting, as seen in diverse career paths involving freelancing, industry R&D, and game achievements.
  • Inferential Framework Finesse: Talk of efficient LLM inference included Modal's infrastructure optimization, with a dive into filesystem and container runtime custom solutions by Erik at Data Council, available here. Modal also stirred interest with their eight-figure revenue growth and ongoing hiring, detailed further at Modal's hiring link. Etched's faster chip for running transformers more than 10x faster than GPUs signaled an engineering leap, with job openings accessible via Etched's hiring link.
  • Deployment Decoded and Credits Decrypted: From error resolution in Modal's hello world example to embedding models, engineers dissect the deployment intricacies and swoop on credit opportunities—Charles from Modal dished extra $500 credits, expounded upon at Modal's blog. Meanwhile, predibase users report fine-tuning oddities and credit discrepancies, exploring if adapters could trim nonsensical continuations generated by the L3 70B base model.

While credits fueled the AI engine rooms and technical tidbits circulated, members swapped both assistance and anecdotes—an emblem of the guild's pulsing core of collective progress and exchange.


CUDA MODE Discord

CUDA Conundrums and Triton Tips: Users discussed the tech used for generating digital humans without concluding, and sought efficient LLM training methods on multiple GPUs. Challenges were noted in Triton for indexing a tensor in shared memory, and advice on Triton and Torch was provided for those considering a switch to CUDA/Triton programming.

Torch Troubleshooting and Profiling Proficiency: Users shared on debugging NHWC tensor normalization, opening metal traces using the torch.mps.profiler, and sought to understand torch.compile along with its child function calls.

AO's Arrival and Sparsity Specs: News emerged about an Apple Metal kernel and 2:4 sparsity benchmarks contributing to PyTorch AO, sparking debates on torch.ao.quantization deprecation and discussing the efficiency of structured pruning.

Blog Browsing and Binary Banter: A mention of State Space Duality delved into on goomblog, while discussions flourished around PyTorch's uint2-7 types and custom dtype string-conversion for TrinaryTensor.

ARM's Acceleration Aspirations: Conversation revolved around the capabilities and support of ARM for Hexagon SDK and Adreno SDK, with a member sharing resources on ARM's performance and discussing its potential in GEMM implementations.


Unsloth AI (Daniel Han) Discord

  • VRAM Vanquished by Token Increase: Extending llama-3-8b to 64k tokens caused an OutOfMemoryError on an H100 with 80GB VRAM; discussions aimed to resolve this through gradient checkpointing and tuning configurations.
  • Speedy Sustained LLM Pretraining: Unsloth AI’s new update allows for doubling the speed and halving the VRAM usage compared to Hugging Face + Flash Attention 2 QLoRA during continuous pretraining of LLMs, as discussed in their recent blog.
  • Questions on Multi-GPU and 8-bit Optimization: The community actively engaged in conversations about multi-GPU support and testing Unsloth AI’s performance on different GPU configurations, while addressing the current limitations of fine-tuning with 8-bit quantization on models like phi-3-medium-4k.
  • Unsloth Setup and Optimization Tactics: Instructions and troubleshooting tips for local Unsloth setup were shared, including the use of Jupyter Notebook and Docker, with links to GitHub readme and Jiar/jupyter4unsloth. The community also covered LoRA rank calculation, referencing insights from Lightning AI.
  • Community Cordiality Continues: New members were warmly welcomed into the community, fostering a supportive environment for collaboration and knowledge exchange.

Perplexity AI Discord

Wikipedia Bias in Academic Searches: A user highlighted potential issues with Perplexity's academic search capabilities, pointing out a bias towards Wikipedia over other sources like Britannica, and provided a link to the search results.

AI Services Experience Simultaneous Downtime: Reports emerged of simultaneous outages affecting Perplexity, ChatGPT, and similar AI services, spurring discussions about a larger infrastructure issue possibly connected to common providers like AWS.

The Opus 50 Limit has Users Longing for More: Users expressed dissatisfaction with the new Opus 50 limit, comparing it unfavorably to the previous Opus 600 and criticizing Perplexity's communication about it.

Perplexity vs. ChatGPT: A Duel of AI Titans: Discussions around the pros and cons of Perplexity AI Premium and ChatGPT touched on web search capabilities, model range, subscription limits, and practical use cases for both platforms.

Tech Talk Assistance for School: AI enthusiasts shared resources and advised using AI tools to assist with school presentations on AI, highlighting the need to explain both benefits and risks, along with sharing a YouTube video for technical understanding.


HuggingFace Discord

  • Fine-Tuning Fervor: The community discussed the advantages of fine-tuning models like Mistral 7B Instruct on customized datasets for improved instruction following. They also explored Vision-Language Models (VLMs) and challenges in integrating visual data with language models, emphasizing the need for alignment with tokenizers and datasets suitable for specific tasks (Vision-Language Modeling Primer).
  • Enhancing Image Generation via Corruption: A collaboration with Microsoft and CMU resulted in a study highlighting the impact of slight corruption in pre-training data on the quality of diffusion models. Alternatively, a blog post discussed Diffusion Policy, a visuomotor learning algorithm that begins with Gaussian noise to predict actions, underlining its novel approach to generating actions through an encoder-decoder model.
  • New Tools and Pipelines: Hunyuan DiT pipeline was added to the diffusers library, providing a fresh method for image and audio generation (Diffusers Release Notes). Moreover, the community was invited to improve LevelBot's new activity tracker by integrating additional actions and activities like GitHub contributions into its system (LevelBot Activity Tracker).
  • Optimizing ML Workflows: There's active engagement in improving model inference efficiency, with discussions on utilizing jit.trace for SDXL and other optimization tips found in the Diffusers optimization guide. Furthermore, troubleshooting included the use of explicit function imports to resolve potential version conflicts.
  • Dataset and Algorithm Revelations: Novel datasets are being shared, like the German parliament speech data for ASR/TTS (Bundestag ASR Dataset). Additionally, a focus on preference alignment was highlighted in the Alignedge paper, introducing the ORPO algorithm that enhances preference alignment without an additional fine-tuning phase.

OpenAI Discord

AGI Amusement and Practical AI Tools: The Discord community pondered the essence of AGI with humorous suggestions such as flawless USB insertion skills, indicating the expectation for AGI to perform complex human-like tasks. Useful AI tools like Elicit were recommended for summarizing scientific research, with Elicit notably commended for its efficient paper summarization and synthesis.

ChatGPT Takes a Sick Day, Voice Mode Hesitation: Speculation around ChatGPT outages included backend provider issues and a potential DDoS attack by Anonymous Sudan. The rollout of new voice mode features in GPT-4o was discussed, with mixed feelings about the promised timeline and reported persistent issues such as 'bad gateway' errors and laggy Android keyboards.

The Prompt Engineering Conundrum: Challenges in prompt engineering were aired, especially the difficulty of adhering to complex guidelines, leading to calls for improved versions. WizardLM 2 was suggested as a high-performing alternative to GPT-4, and breaking down complex prompts into steps was recommended as an approach to optimize results.

API Affordability under Scrutiny: Conversations turned to the cost of using the GPT API versus ChatGPT Plus, with API potentially being the cheaper option depending on usage. Alternatives like OpenRouter and WizardLM 2 were proposed for better value, and an article titled "Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models" was endorsed as a must-read for prompt engineering insights.

Rollout Delays and Performance Puzzles: Delays in new feature rollouts and performance issues with large prompts were common concerns. To counteract the sluggish response with hefty prompts, lazy loading was mentioned as a potential solution to browser difficulties.


LM Studio Discord

LM Studio GPU Sagas: Engineers discussed the behavioral quirks of LM Studio models with an emphasis on offloading parameters, affirming that running on a dedicated GPU often yields better results than shared resources and underlined the fine line between model size restrictions and GPU memory—mentioning that models should be under 6GB to alleviate loading issues.

Model Recommendations for Codewranglers: The CodeQwen 1.5 7B and Codestral 22b models were specifically recommended for code optimization tasks, while Wavecoder Ultra was also suggested despite its obscure launch history. Additionally, the utility of platforms like Extractum.io was highlighted for filtering models based on criteria such as VRAM and quantization.

The Fine Print of AI Performance: Conversation veered into the technical details of AI limitations, noting that performance can often be limited by memory bandwidth, and members suggested targeting an 80% workload in relation to physical core count on processors. The uncertainty surrounding future Chinese language support was also brought up.

Do-It-Yourself Servers Draw Debate: Discussions around building custom homelab GPUs focused on VRAM capacity, driver support, and performance between manufacturers. Concerns were addressed regarding second-hand GPUs' reliability and members weighed pros and cons of AMD ROCm versus NVIDIA's ecosystem for stability and throughput.

Engineering a Beta Buff: In the world of software development and AI tinkering, continue.dev was lauded for local setups, particularly for supporting LM Studio configuration, while a call for testers was raised for a new AVX-only extension pack, showcasing the community's collaborative spirit and ongoing optimization endeavors.


Nous Research AI Discord

  • AI Takes the Stage: Discussions emerged around the Wayseer Manifesto - Official Video, evidently popular for its motivational message, and design talk sparked by Nous Research’s Twitter account, hinting at a creative flair within the AI community.
  • OpenAI Unwrapped: Speculation arises about OpenAI's GPT-4, with members eagerly anticipating potential capabilities and implications for future AI research and application realms.
  • Gearing Up for T5 and Beyond: Technical conversations revealed that the T5 model sets a high barrier for adoption due to its hefty hardware requirements; meanwhile, promising alternatives like an open-source UI from Mobius for chat assistants and potential improvements via ggml are subjects of interest.
  • Graphical Glitches with Pixart: Technical angst surfaced as Pixart struggles when scaling to datasets larger than 10k images, unlike other models that retain stability with up to 400k images, attributing success to unique training methodologies.
  • WorldSim Wonders and Wisdoms: The recent WorldSim Jam Session is available on YouTube, coupled with a dose of irony in recognizing that agents might be the first job category outpaced by AI, and a re-engaged member festivities by sharing their return and research progress.

Stability.ai (Stable Diffusion) Discord

  • SD3 Faces Tough Crowd: Initial feedback from users indicates that the early model of Stable Diffusion 3 (SD3) struggles with hand depictions, lagging behind Dall-E in some aspects; however, optimism remains for the potential improvements through custom models upon wider release.
  • Architecture’s AI Angle: Discussions surfaced around applying Stable Diffusion to architectural visualization, suggesting the use of img2img techniques with detailed inputs to enhance output quality, despite the tool's limitations with rendering straight lines and geometrically accurate mechanics.
  • Plugin Pitfalls: Users are encountering quality degradation issues when using the wildcards plugin with Stable Diffusion, reporting grainy results and color distortions despite multiple installation attempts.
  • Community Model Mining: The engineering community recommends exploring community models available on platforms like civitai.com and utilizing ChaiNNer, a node-based image processing GUI tool, to improve and upscale image results from Stable Diffusion.
  • AI's Celebrity Conundrum: The rise of AI-generated influencer profiles such as 'celebrity LoRas' on Civit prompted a tongue-in-cheek debate on the nature of celebrity in the era of AI, highlighting the blurred lines between virtual and reality as these profiles gain followers and media attention.

LAION Discord

SD3 Models Grapple with Grainy Results: Users highlight a spotty noise issue in SD3 2B models despite using advanced features like a 16ch VAE, with noise artifacts particularly evident in areas such as running water. Skepticism has been voiced about the current validation metrics and loss functions for SD3 models, as they are perceived to poorly indicate model performance.

Open-source Breakthrough for Video Models: The community showed enthusiasm about an Apache2 licensed video-capable CV-VAE, expected to be a valuable resource for research on latent diffusion-based video models.

Peering into Future Model Architectures: newly released research introduces the State Space Duality (SSD) framework and the cutting-edge Mamba-2 architecture, claimed to be 2-8X faster than its predecessor, contesting Transformer models in language processing tasks (arxiv paper).

Training Tactics Under Scrutiny: A preprint suggests that embeddings perturbed by slight corruption of pretraining datasets can improve diffusion models' image quality (arxiv preprint), while others mention using dropout and data augmentation to prevent overfitting in large diffusion models, and a debate on whether adding training data difficulty can enhance model robustness.

Aesthetic Assessments and Realism Rivalries: Comparisons between SD3 images and Google's realistic examples have sparked discussions, with SD3 images being humorously likened to "women suffering a bad botox injection" (Reddit examples), and Google's work earning praise for its textured cloth and consistent hair representations (Google demo).


Eleuther Discord

  • E-Paper or Not E-Paper? That is the Question: Members hotly debated the legitimacy of the Daylight tablet's e-paper claims after a review video suggested it might be a reflective LCD instead. The discussion gravitated around whether the Daylight is misbranded existing Sharp RLCD tech or a genuine innovation, with members suggesting a possible teardown for clarity.
  • Beyond Heads and MLPs: Discovering LLM Knowledge Circuits: A new research paper revealing deeper insights into LLM knowledge circuits has caught the attention of members, with one member valuing its departure from focusing on individual network components. The research community also delved into whether corrupted datasets might actually improve diffusion models and the synergy between RNNs and pre-trained transformers.
  • Public Servant AI?: Users discussed efficiency improvements for AI tasking, with concerns about single-query slow processing and the impact of default n-shot values on results. There's also a practical search for the smallest Huggingface decoder model to study energy use, and a GitHub pull request introducing machine-translated ARC challenges across multiple languages.
  • The Efficiency Hunt: Concerns about model size have been posed, with efforts to condense a 22GB model like TinyLLama.db for better activation and weight entry coordination. Furthermore, the community pondered using a differentiable top-k function for image classification, potentially to hone model focus on the most significant elements.
  • Global Benchmarking Gets Multilingual: An initiative to expand the ARC challenge benchmark to 11 machine-translated languages via a collaborative pull request was put forward, with an eye on future language additions. Review and contributions to this multilingual extension of benchmarks are underway.

OpenRouter (Alex Atallah) Discord

  • Cryptocurrency Credits Conundrum: A user encountered issues with credits not appearing after an ETH payment, prompting advice to wait up to an hour before lodging a complaint; patience might be a virtue in blockchain timing.
  • LLMs' Proficiency with Prefills: The consensus among users is that Language Learning Models (LLMs) adeptly handle prefill text, ensuring the generation of subsequent content is consistent with the initial input.
  • Turbo Troubles and API Oversight: GPT-3.5 Turbo's inconsistencies led to a discussion about potential API moderation, with a reminder that OpenAI mandates moderation for all requests via their moderation API.
  • Mistral's Momentary Mute: Reports of receiving empty responses from Mistral: Mixtral 8x22B Instruct prompted administrative guidance to set DeepInfra as a preferred provider and check load balancing documentation for resolving provider-specific issues.
  • Narrative Nuances via Neural Networks: When debating the best models for storytelling, users recommended various roleplay-specific models, directing attention to OpenRouter's rankings for those particularly excelling in this creative endeavor.

Modular (Mojo 🔥) Discord

  • Python Speed Boost with a Line: A tutorial video on how Numba can dramatically increase Python performance using JIT compilation was highlighted. The impact of this one-liner, however, piqued interest regarding the potential for achieving similar performance without additional libraries.
  • Efficiency in Python and Mojo: The efficacy of utilizing for loops within while loops in Python was debated, with a recommendation to explore generators via a Real Python resource. Additionally, the possibility of Mojo's MAX tool expediting Python execution was discussed, comparing it to enhancements brought by Tensor and Torch libraries.
  • Mojo Async Paradigm Sparks Controversy: The suggestion to default all Mojo functions to async sparked a debate. Concerns were voiced about straying from Python standards and complicating the workflow for those accustomed to explicit async/await methodologies.
  • Ownership Deep Dive by Modular: A blog post titled Deep Dive into Ownership in Mojo, featuring insights from CEO Chris Lattner, was introduced. This piece is a sequel to an earlier exploration of ownership as a conceptual framework.
  • Challenging Rust with Project Verona: Project Verona was put under the spotlight as a rival to Rust in providing memory safety with a gentler learning curve. Enthusiasts are directed to watch a YouTube talk, "Concurrent Mutation must go," for an in-depth discussion on the topic.

LangChain AI Discord

  • Bind_Tools Tamed for Ollama Models: Members confirmed that LangChain supports bind_tools for Ollama models through the OllamaFunctions class, and provided a relevant GitHub issue link for additional reference.
  • Building Customer Support with AI: An ongoing discussion on creating AI-driven customer support systems identified LangChain, LLMs like Llama3, and custom tools for actions such as user verification, with shared Python code as an example for chaining models and tools.
  • Preserving Conversation Memory in SQL Failures: SQL agent chat context preservation was a hot topic, with a shared code snippet using ConversationBuggerMemory. However, there were concerns regarding unsupported kwargs.
  • Categorization Using LangChain and Embeddings: The guild explored strategies for categorizing 10,000 free-text responses using LangChain and embeddings, highlighting the use of prompt engineering to enhance efficiency.
  • Text Editing Redefined with Automated Chat Analyzer: An automated chat analyzer that can produce editable plain text Q&A from message lists was introduced, aiming to ease manual editing and reduce compute usage.

tinygrad (George Hotz) Discord

  • Bitshifting Buzz in Backend Development: Engineers engaged in a lively debate on the merits of implementing a bitshift operation PR #4728 across all backends in tinygrad, with scepticism around its performance boost compared to traditional multiply/divide operations.
  • Testing Puzzles in GPU Land: Curiosity arose as to why device tests from 'gpuctypes' were absent, referencing a specific test_cuda.py missing tests, contributing to the ongoing discussion about thorough testing practices.
  • Diving into tinygrad's Depths with Hotz: George Hotz unveiled plans for a tinygrad presentation focused on clarity and deep dives into the codebase, emphasizing the project's independence from dependencies like CUDA.
  • Lean Toward Clearer Mechanization Bounties: The community grappled with ambiguous problem statements regarding ShapeTrackers in Lean, recommending a review of ShapeTracker in tinygrad's repository for clearer understanding.
  • Traceback Trail for Tensors: A proposal to add 'traceback' attributes to new Tensor instances in tinygrad was revisited, emphasizing the potential for enhanced debugging despite previous incomplete attempts like PR #3302.

LlamaIndex Discord

  • LLM-Sourced Python Scripting Wins: An LLM demonstrated its scripting chops by producing a Python script for extracting structured data from Gmail, an approach that could streamline data extraction processes across diverse email datasets.
  • Google Gemini Widens the Window: The structure and processing capabilities of Google Gemini were highlighted in a LlamaIndex agent, touting an impressive 1 million token context window to tackle complex, multifaceted queries from heterogeneous documents.
  • Custom Parsing Conundrums and Solutions: Challenges around implementing Langchain’s HTMLHeaderTextSplitter led to the engineering of a custom solution within LlamaIndex's IngestionPipeline, supported by custom transformations documentation.
  • Rkhettry Cracks Chroma Code: A document retrieval issue from VectorStoreIndex within Chroma's vector store was addressed by a user through direct access methods, demonstrating the practical problem-solving approaches in database manipulation.
  • Metadata Magic for Enhanced Indexing: Incorporating metadata was recommended to improve document retrieval within indexing systems, as outlined in LlamaIndex's metadata extraction guide, emphasizing the importance of rich document descriptors for fine-grained search capabilities.

Interconnects (Nathan Lambert) Discord

  • Phi-3 Models Ascend the Leaderboards: The Phi-3 Medium (14B) and Small (7B) models have been highlighted on the @lmsysorg leaderboard, with the Medium model being in close proximity to GPT-3.5-Turbo-0613 performance levels and the Small model compared to Llama-2-70B and various Mistral fine-tunes. The community reaction includes both humor at personal wagers gone awry and serious discussions on the sustainable growth and reputation gains in such model rankings.
  • OpenAI's Inner Turmoil: Current and former OpenAI employees penned an open letter bringing up concerns about insufficient oversight in AI development, while the firing of researcher Leopold Aschenbrenner for the alleged leak of proprietary information elevated conversations around trade secrets and national security. Additionally, skepticism proliferates the channels regarding scaling up compute as a linear path to AGI, with users questioning the sustainability of such growth without formidable challenges.
  • Scaling Laws, Seriously?: A layer of humor underscores discussions of scaling laws, with users mocking the faith in perpetual scaling deep into the 2030s through a meme, as well as a playful request for "just 100k more GPUs" for achieving a hypothetical 10 trillion parameter model. Frustration is expressed over the starkly contrasting beliefs in the AGI debates, with criticism aimed at parties with little acknowledgment of their epistemic uncertainty.
  • Addressing AI Whistleblower Safeguards: The concerns for safety and oversight in AI justified by an open letter from OpenAI employees and subsequent talks illustrate disquiet regarding how rapidly advancing AI might be mishandled due to powerful financial incentives against robust oversight.
  • Stale Memes Gather No Engagement: The extremely low engagement in the #memes channel suggests either a disinterest in meme culture or the need for fresher content to stimulate exchanges among the AI engineering audience.

Latent Space Discord

  • TorchTune's Shoutout for Newsletter Recognition: An individual from the Torchtune community asked for their work on state-of-the-art (SOTA) models and methods to be featured in the AI News newsletter, inviting others to join their server with a provided invite link.
  • Tech Glitch in AI News Delivery: A formatting glitch was reported with AI News emails on ProtonMail, causing display issues when dark mode is enabled, which only allows links and images to be clearly seen.
  • Behind the Podcast Curtain: The secret sauce behind the podcast's automated transcripts, which include speaker identification, was disclosed to be Google's smol-podcaster tool, enhanced with manual edits for speaker names.
  • LiveKit's Lucrative Leap: LiveKit successfully raised $22.5 million in a Series A round, aiming at establishing a fresh infrastructure for AI's transport layer, focusing on real-time voice and video interactions, despite a challenging fundraising experience amplified by the emergence of GPT-4. The details were shared in a tweet.
  • Twelve Labs' Multi-Model Money Magnet: Twelve Labs bagged a $50 million Series A investment for innovating video understanding AI foundation models, introducing their new Marengo 2.6 that fuses multimodal capabilities in a single API; full information can be found in their press release.

OpenAccess AI Collective (axolotl) Discord

  • Conference Joy Without the Paper Trail: Engineers reflected on the rewarding experience of participating in conferences even without contributing papers, signifying the importance of community and knowledge exchange.
  • Help Wanted with OrPO’s Stubborn Formatter: A user is struggling with a Python script for a custom OrPO formatter to tokenize datasets and has reached out for support. A related script has been shared for reference.
  • AI's Medical Diagnosis Dilemma: A tweet highlighted the poor performance of advanced AI models like GPT-4V and Gemini Pro in medical Visual Question Answering (VQA) tasks, using the ProbMed dataset as a benchmark. The engineering community discussed the challenges faced by visual LLMs in medical fields.
  • Seeking and Succeeding with arXiv Endorsement: An AI collective member sought an arXiv endorsement for the cs.LG category and managed to solve their dilemma by resorting to their organizational email.
  • Troubleshooting LoRA Training Hitches: An engineer encountered a hiccup where QLoRA training output was not initiating as expected. Another member pointed out that LoRA training scripts might automatically download the necessary model from Hugging Face Model Hub if it's not available locally.

Cohere Discord

Artificial Ivan Troubleshoots Like a Pro: Cohere has advanced its "Artificial Ivan" to version 4.0, enabling it to troubleshoot code and sharing an affirmations app tied to his developments.

Real Ivan's Easing into Early Retirement?: One user's quip about a human counterpart, the "real Ivan," potentially retiring at 35 due to Artificial Ivan's accomplishments, brought a humorous spin to the project's success.

Cross-Project Synergy Unlocked: A user highlighted the integration of Aya 23 with Llama.cpp and LangChain, offering a sample code and seeking assistance to implement a stopping condition in conversations using "\n".

Seeking Bilingual AI Conciseness: Detailing code aiming to produce concise, Spanish-language responses, the user outlined the use of prompts for conversation memory and parameters to enhance Aya 23's performance.

Cohere's Community Corner: Contrasting with Langchain, a playful comment from a guild member described the Cohere Discord as a "chronically online AI lab," pointing to lively interaction and engagement among its members.


Mozilla AI Discord

  • CUDA Conundrum Conquered: An engineer resolved CUDA error issues by using --recompile --gpu NVIDIA and discovered that the flag -ngl 9999 must come after --recompile for effective resolution.
  • Peer Power Prevails in Problem Solving: Successful troubleshooting of CUDA recompilation was attributed to community collaboration and a resourceful check of the --help options, with a helpful GitHub resource also being shared.
  • Engineers Emphasize Unity and Treats: The sentiment within the community highlighted the importance of learning from one another and a humorous nod to "cookies" as part of community spirit.
  • CPU Operations Entering New Phase with RISC-V: A recent YouTube video, "The Magic of RISC-V Vector Processing", shed light on the ratified 1.0 RISC-V Vector Specification and its expected impact on vector computations.
  • Link Love: Engineers were directed to further details and discussions via two key links: the RISC-V Vector Processing video explaining new specifications and advancements, and a GitHub repository for llamafile to assist with distributed LLMs.

OpenInterpreter Discord

  • Windows Wizard Wrangles Gemini Setup: A member outlined a detailed setup guide for Gemini model on Windows systems, citing outdated official documentation. The guide includes command-line steps and workarounds for common setup issues.
  • AR Enters the Workspace with Spacetop: Sightful's Spacetop AR Laptop grabbed attention with its battery-free glasses and unique design intending to replace traditional laptop displays, sparking discussions on its place in the future of mobile computing. Members also discussed Xreal glasses, mentioning their reliance on an external device for power and the need for improvement in resolution and field of view.
  • Macbook M1 Meets Match with Infinite Loop: A user reported a persistent issue when running poetry run 01 --local on a Macbook M1, facing an infinite loop with the ollama llama3 setup, and is seeking a solution.
  • Secure Version Query Goes Unanswered: A question was raised about the release of a secure version of a discussion topic, but it went without a specific answer or follow-up within the discussions.
  • Availability and Battery Life in AR Glasses Explored: Conversations on Xreal glasses included experiences with using them with a MacBook, highlighting the limitations they currently have in terms of resolution, battery life, and device dependency for power.

YAIG (a16z Infra) Discord

  • Initiating Direct Dialogue: Members in the ai-ml channel have expressed interest in collaborating and have opted to continue their exchange via direct message for more detailed discussions.

MLOps @Chipro Discord

  • AI and ML under the Microscope: The Infer: Summer ‘24 conference, hosted virtually by Qwak on June 26, offers an in-depth look at AI and ML practices with a focus on real-world applications from industry experts.
  • Get Ready for a Deep Knowledge Dive: Those interested in recommender systems and AI in sports will find targeted sessions on advanced ML model construction and AI-driven sports analytics.
  • Safe AI Takes the Spotlight: AI safety and adherence to regulation is set to be a main talking point, highlighting strategies like "Schematic Questioning" to mitigate risks such as inaccurate content in AI systems.
  • From Stream to Bank: Attendees can expect insights from heavy-hitters at Disney Streaming, Lightricks, LSports, and Lili Banking, who will convey real-world experience with AI/ML integration.
  • Production-Ready LLMs Explored: Large Language Models (LLMs) like GPT and Llama feature prominently, with discussions planned around effective implementation in production settings across various industries. Conference Registration

DiscoResearch Discord

  • xLSTM's Open Source Debut: Dr. Tristan Behrens has announced the release of xLSTM's source code, a move sure to excite AI engineers and developers. The official announcement and access to the source code can be found on his LinkedIn post.

The LLM Perf Enthusiasts AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI Stack Devs (Yoko Li) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Datasette - LLM (@SimonW) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: !

If you enjoyed AInews, please share with a friend! Thanks in advance!

Don't miss what's next. Subscribe to AI News (MOVED TO news.smol.ai!):
Share this email:
Share on Twitter Share on LinkedIn Share on Hacker News Share on Reddit Share via email
Twitter
https://latent....
Powered by Buttondown, the easiest way to start and grow your newsletter.