AI News (MOVED TO news.smol.ai!)

Archives
August 8, 2024

[AINews] not much happened today

This is AI News! an MVP of a service that goes thru all AI discords/Twitters/reddits and summarizes what people are talking about, so that you can keep up without the fatigue. Signing up here opts you in to the real thing when we launch it 🔜


anonymous strawberries are all you need.

AI News for 8/6/2024-8/7/2024. We checked 7 subreddits, 384 Twitters and 28 Discords (249 channels, and 2423 messages) for you. Estimated reading time saved (at 200wpm): 247 minutes. You can now tag @smol_ai for AINews discussions!

No clear major story for the day but lots of interesting small nuggets:

  • Mistral Large's external scores are in, and they do very well - Gemini Pro-tier - on hard Lmsys prompts, as well as independent benchmarks like Aidanbench
  • Code a Vision Language Model from scratch! (thanks Sam Julien for picking this one in the LS Discord)
  • The new PyTorch FlexAttention subsumes all Attention Variants including FlashAttention 2 (not FA 3 though!)'s API and the increasingly popular local-global attention spectrum including Sliding Window
  • Check out the Grokfast optimizer!

You could, of course, spend one more epoch on Segment Anything 2, which is now up on the Latent Space Podcast.


The Table of Contents and Channel Summaries have been moved to the web version of this email: !


AI Twitter Recap

all recaps done by Claude 3.5 Sonnet, best of 4 runs.

OpenAI Structured Outputs and Model Updates

OpenAI has introduced structured outputs in their API, allowing developers to enforce specific JSON schemas for model responses. This feature is now supported across various models, including gpt-4-0613 and gpt-3.5-turbo-0613 and later versions. @sama announced this highly requested feature, which enables 100% reliability in matching output schemas according to OpenAI's evaluations. The update includes:

  • A new "strict" mode for function calling that ensures outputs match the supplied tool definition
  • A new "response_format" parameter that allows specifying JSON output schemas
  • Introduction of a new model: gpt-4o-2024-08-06

@rohanpaul_ai highlighted that this update achieves 100% reliability in matching output schemas, which is particularly useful for downstream tasks when the model is not calling a tool but responding to the user in a structured way.

Additionally, @corbtt noted that OpenAI quietly dropped the price of GPT-4o (the real one, not mini) by 50% without a formal announcement, now listing it at $2.50/1M tokens on their pricing page.

AI Model Developments and Benchmarks

Several new AI models and benchmarks have been announced:

  1. Mistral Large 2: @GuillaumeLample announced the release of Mistral Large 2, which performs exceptionally well in coding, hard prompts, math, and longer query categories, outperforming GPT4-Turbo and Claude 3 Opus in some areas. It's now leading the Arena hard leaderboards and is an open-weight model.

  2. Idefics3-Llama: @mervenoyann introduced Idefics3-Llama, a multimodal model based on Llama 3.1 that accepts an arbitrary number of interleaved images with text and has a huge context window of 10k tokens.

  3. BigLlama-3.1-1T-Instruct: @maximelabonne presented an upscaled version of Meta-Llama-3-120B-Instruct, created through a self-merge of Llama 3 70B.

  4. New benchmarks: @aidan_mclau introduced a new benchmark called "big_model_smell" that measures creativity, reliability, attention, and instruction following.

AI Hardware and Robotics

@adcock_brett introduced Figure 02, described as the world's most advanced AI hardware. Key features include:

  • 6x Cameras
  • 50%+ Battery capacity
  • Onboard Vision Language Model (VLM)
  • 3x CPU / GPU power
  • 4th Gen Hands
  • Integrated wiring
  • Exoskeleton structure
  • Speech-to-speech reasoning capabilities

The robot is designed for autonomous operation and includes a custom 2.25 KWh battery pack, aiming for up to 20 hours of useful work per day.

AI Safety and Regulation

@ylecun shared concerns about California's SB1047 (Safe and Secure Innovation for Frontier Artificial Intelligence Models Act), stating that it won't solve intended issues and may harm AI R&D in academia, small tech companies, and the open-source community. @fchollet echoed these concerns, arguing that holding open model developers responsible for all fine-tuned models downstream makes no sense and could discourage open model sharing.

Miscellaneous AI Developments

  • @omarsar0 discussed the importance of structured outputs in improving LLM application performance and reliability.
  • @jeremyphoward announced FastHTML, a growing gallery of live FastHTML code examples for building interactive components and applications.
  • @LangChainAI introduced support for OpenAI's new structured output functionality in their latest release candidates for both Python and JavaScript.

These developments showcase the rapid progress in AI model capabilities, hardware integration, and the ongoing discussions around AI safety and regulation in the field.


AI Reddit Recap

/r/LocalLlama Recap

Theme 1. LLMs as Productivity Boosters in Research and Development

  • auto-md | tool | One click convert files/zips + GitHub repositories into Markdown documents (.md) (Score: 62, Comments: 10): The tool auto-md has been updated with a Windows .exe version, allowing users to convert files, zips, and GitHub repositories into Markdown documents with a single click. The developer plans to release a Mac app soon and appreciates the support received, including GitHub stars and user feedback from previous posts.
    • Dark_Fire_12 shared an alternative approach to building a similar tool, opting for file extension filtering instead of folder depth search. They included a screenshot demonstrating their implementation.
    • Environmental-Car267 mentioned creating two similar tools for personal use: one that copies codebases to clipboard for pasting into Sonnet/GPT, and another that lets AI select important files autonomously. These tools exclude certain folders and files during the process.
  • How a research scientist at Google Deepmind uses LLM (Score: 318, Comments: 89): Nicholas Carlini, a research scientist at Google DeepMind, shares his approach to using Large Language Models (LLMs) for productivity enhancement in a detailed blog post. The article emphasizes the significant value in augmenting human capabilities with AI, suggesting that this intermediate step is crucial before aiming for fully autonomous AI systems.
    • Users agree that LLMs are both overhyped and underhyped, with many people either exaggerating their capabilities or dismissing them entirely. The technology is particularly useful when operating at the "edge of your knowledge," helping fill gaps in partial understanding.
    • The article demonstrates LLMs' tendency to hallucinate, as it incorrectly stated there's no Python library for Podman, despite the existence of podman-py. Users emphasize the importance of evaluating what LLMs can do rather than focusing on their limitations.
    • Many users report significant productivity boosts from using LLMs, with one estimating a 50% increase in coding speed. LLMs are particularly helpful for learning new technologies, automating mundane tasks, and debugging, though some express concerns about their use in academic writing.

Theme 2. Advancements in AI Model Compression and Quantization

  • Quantize 123B Mistral-Large-Instruct-2407 to 35 GB with only 4% accuracy degeneration. (Score: 77, Comments: 54): The author quantized the 123B Mistral-Large-Instruct-2407 model from 228.5 GB to 35.5 GB using the EfficientQAT algorithm, resulting in only a 4% average accuracy degeneration across 5 zero-shot reasoning tasks. The quantized model, using INT2 bits and a group size of 64, was packed using GPTQ v2 format and uploaded to HuggingFace, with the author seeking assistance in converting it to GGUF or EXL2 formats.
    - Users strongly expressed the need for a GGUF format version of the quantized model, with multiple comments requesting this conversion from the current GPTQ v2 format.
    - There was skepticism about the model's performance, with one user pointing out that the perplexity increases 100% and another correcting the accuracy degeneration to 5.4% instead of 4%.
    - A user attempted to load the model using Exllamav2 0.1.7 but encountered a RuntimeError, suggesting compatibility issues with the current loader for this quantized version.

Theme 3. Open-Source AI Tools and Multimodal Generation

  • Open source Text2Video generation is here! The creators of ChatGLM just open sourced CogVideo. (Score: 61, Comments: 4): The creators of ChatGLM have open-sourced CogVideo, a text-to-video generation model. CogVideo can generate 5-second videos at 24 FPS and 256x256 resolution based on text prompts, representing a significant advancement in open-source AI video generation capabilities.
    - CogVideo specifications: 6 seconds long, 8 FPS, 720x480 resolution, requires 18GB GPU memory for inference with SAT or 36GB with diffusers. Users noted good coherency but slight lagginess, fixable with flowframes.
    - A ComfyUI wrapper for CogVideo has been made available, enhancing its accessibility and integration with existing workflows.
    - The model's license includes restrictions on commercial use and prohibits usage that may "undermine China's national security and national unity", raising questions about its open-source status.

All AI Reddit Recap

r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity

AI Model Developments and Releases

  • Salesforce's xLAM-1b model: A 1 billion parameter model that achieves 70% accuracy in function calling, surpassing GPT 3.5. Dubbed a "function calling giant" despite its relatively small size.
  • Phi-3 Mini (June) with function calling: Rubra AI released an updated Phi-3 Mini model with function calling capabilities, competitive with Mistral-7b v3 and outperforming the base Phi-3 Mini.

AI Research and Applications

  • Figure 02: A new humanoid robot introduced by Figure AI, showcasing advancements in robotics and AI integration.
  • AI in image generation: Discussion on r/StableDiffusion becoming a general hub for open-source image models, similar to how r/LocalLLaMA became a central place for LLMs.

AI Ethics and Safety

  • OpenAI safety resignations: A humorous post predicting the next OpenAI head of safety will quit on Aug 30, based on "new scaling laws". This highlights the ongoing challenges in AI safety and ethics.

AI Impact on Education and Careers

  • Nick Bostrom on long-term investments: Bostrom suggests it may not be worth making long-term investments like college degrees due to short AI timelines. This sparked debate about the potential impact of AI on traditional education and career paths.

AI-Generated Content

  • Movie posters from a parallel reality: AI-generated movie posters created using Flux Pro + SUPIR Upscale, demonstrating the creative potential of AI in visual arts.

Memes and Humor

  • Various memes and humorous posts related to AI and technology, including comparisons of AI-generated images and satirical takes on anti-AI sentiments.

AI Discord Recap

A summary of Summaries of Summaries

1. LLM Advancements and Benchmarking

  • DeepSeek-V2 Outshines GPT-4 on MT-Bench: DeepSeek-V2 from DeepSeek AI has rapidly climbed to the top of leaderboards like ChatbotArena and MT-Bench, outperforming models such as GPT-4-Turbo and Claude 3 Opus across over 50,000 matchups.
    • Users compared model performance on benchmarks like AlignBench and MT-Bench, with DeepSeek's announcement generating excitement.
  • New Models Advance State of the Art: New open models like Granite-8B-Code-Instruct from IBM enhance instruction following for code tasks, while DeepSeek-V2 boasts 236B parameters.
    • Example: DeepSeek-V2 announcement.

2. Model Performance Optimization

  • AQLM and QuaRot Quantize Llama-3-70b: Quantization techniques like AQLM and QuaRot aim to run massive language models (LLMs) like Llama-3-70b on individual GPUs while maintaining performance, as seen in the AQLM project running on an RTX3090.
    • Users discussed the potential benefits and tradeoffs of quantization approaches for optimizing large model inference.
  • DMC Boosts Throughput 370% on H100 GPUs: Efforts to boost transformer efficiency through Dynamic Memory Compression (DMC) promise throughput improvements up to 370% on H100 GPUs, according to the DMC paper by @p_nawrot.
    • Members explored techniques like fusing CUDA operations with NVIDIA's Thrust library to maximize GPU utilization during model inference.
  • Thrust Optimizes CUDA Ops Near Bandwidth Limits: Discussions centered on optimizing CUDA operations like fusing element-wise ops, leveraging NVIDIA's Thrust library and its transform functionality for near-bandwidth-saturating performance.
    • The Thrust documentation provides insights into these optimization strategies.

3. Fine-tuning Challenges and Prompt Engineering Strategies

  • Axolotl Wrestles with Prompt Design: The importance of prompt design and correct template usage, including end-of-text tokens, was highlighted for influencing model performance during fine-tuning and evaluation with tools like Axolotl prompters.py.
    • Users shared experiences and insights around prompt engineering challenges to achieve desired results.
  • Logit Bias Tunes Prompts for More Control: Strategies for prompt engineering were discussed, such as splitting complex tasks into multiple prompts and investigating logit bias for granular control, following OpenAI's logit bias guide.
    • Members shared experiences and techniques to enhance prompt effectiveness through careful engineering.
  • RET Token Boosts Information Retrieval: Research explored teaching LLMs to use the <RET> token for information retrieval when uncertain, improving performance on less frequent queries, based on an ArXiv paper.
    • The community discussed novel prompt engineering methods to expand the capabilities of language models.

4. Multimodal AI and Generative Modeling Innovations

  • *Idefics2 and CodeGemma: Multimodal Marvels: New multimodal models like Idefics2 8B Chatty focus on elevated chat interactions, while CodeGemma 1.1 7B* refines coding abilities.
    • These releases showcase the rapid progress in multimodal AI capabilities across various domains.
  • *Phi3 Brings Powerful AI Chatbots to WebGPU: The Phi 3 model enables powerful AI chatbots to run in browsers via WebGPU, highlighting the potential for more accessible and private AI interactions*.
    • Community members discussed the implications of this development for user privacy and control.
  • *IC-Light Advances Open Image Relighting: The open-source IC-Light project focuses on improving techniques for image relighting*, contributing to the growing open ecosystem for generative AI.
    • Members shared insights and resources related to image manipulation capabilities powered by AI models.

PART 1: High level Discord summaries

Stability.ai (Stable Diffusion) Discord

  • LoRA Makes Stable Diffusion Lean: LoRA models, small versions of Stable Diffusion, modify standard checkpoints to be 10 to 100 times smaller and can be installed in the stable-diffusion-webui/models/Lora directory.
    • To use these models, simply include the syntax <lora:filename:1.0> in your prompts, enhancing your workflow.
  • Pony Model Delivers Sharp Line Art: The Pony model is engineered for clean line art with no shading and works best when combined with style LoRA for optimal results.
    • Users emphasized that applying the Pony model as the base is crucial to achieving the desired aesthetic when using line art style LoRA.
  • ControlNet Transforms Images Like Magic: ControlNet facilitates converting photos to line art while preserving the original structure, greatly improving image manipulation capabilities.
    • Community members proposed using depth ControlNet or IPAdapter as effective methods for these transformations.
  • Community Drama Erupts in r/stablediffusion: Discussions about recent managerial changes in the r/stablediffusion subreddit revealed tensions regarding community vs. company-led projects.
    • This introspection fueled a lively dialogue about the control issues faced in community dynamics within the AI art space.
  • Skepticism Wins in AI Hardware Debate: A consensus emerged against using AMD GPUs for ML tasks, with suggestions leaning towards NVIDIA or alternatives like Groq being favored.
    • Participants also touched on the volatile nature of hardware stocks, prompting discussions about future choices for optimizing AI performance.


Unsloth AI (Daniel Han) Discord

  • Unsloth Fine-tuning Model Frustrations: Users are facing fine-tuning issues with Unsloth, particularly around models not saving properly and integration challenges with PPO trainers requiring the for_inference() method.
    • Many have noted that older versions integrated more smoothly, contributing to ongoing community frustrations.
  • Inconsistent Inference Timing on Llama3.1: Reports indicate inconsistent response times during inference on fine-tuned Llama3.1, with improvements seen after repeated calls.
    • Users are advised to run tests to confirm whether initial slowdowns are affecting performance as expected.
  • Exploring Multi-GPU Support in Unsloth: Unsloth's multi-GPU support is in beta, looking to enhance speed and reduce VRAM usage, with testers currently under NDA.
    • Participants are anticipating a paid subscription model following further refinements.
  • Introducing BigLlama 3.1-1T-Instruct: A new model, BigLlama-3.1-1T-Instruct, is being trialed as a self-merge of Meta-Llama, but users report it is not yet functional with merged weights.
    • Community feedback emphasizes the model's uselessness at this stage due to incomplete training.
  • Cost-effective Configuration for LLaMA3: A request was made for strategies to run LLaMA3 cost-effectively on RunPod, reflecting the community's focus on optimizing deployment costs.
    • Members discussed the challenges of managing resource demands while keeping costs under control.


HuggingFace Discord

  • Google enhances Gemma with Gemma 2 2B: Google introduced Gemma 2 2B, boasting 2.6B parameters designed for on-device usage alongside ShieldGemma and Gemma Scope for advanced functionality.
    • This rollout positions Gemma 2 as a competitive offering in on-device machine learning tools.
  • New Diffusers integration with FLUX: A member praised the new Diffusers integration for FLUX, enhancing text-to-image generation capabilities significantly.
    • They shared a gist on using FLUX efficiently with limited resources.
  • Launch of Argilla 2.0 for better data management: Argilla 2.0 debuted as a powerful AI tool focused on data usability, promising enhanced management features for creators.
    • Community members welcomed the first open synthetic dataset, magpie-ultra-v0.1, generated with Llama 3.1 for improved dataset creation.
  • OpenAI promotes Structured Outputs: OpenAI has published a blog post recommending the use of structured outputs in their API without much attribution to previous work.
    • This shift highlights a trend in adopting effective practices while maintaining a lack of acknowledgment for foundational contributions.
  • Dataset for Named Entity Recognition available: A dataset consisting of 5029 annotated CVs with IT skills marked using NER is available on Kaggle.
    • This dataset includes manually annotated skills from PDFs and is formatted in JSON for use with NLP tools like Spacy.


LM Studio Discord

  • Configuring LM Studio with AnythingLLM: Users successfully set up AnythingLLM with LM Studio after troubleshooting file accessibility and hardware limitations affecting performance. One user confirmed success after loading a custom Gemma v2 model.
    • Several users contributed insights on common pitfalls during the setup process, focusing on the importance of ensuring file paths are correct.
  • Optimizing Performance Settings in LM Studio: The 'Keep Model in Memory' feature drew mixed reactions, with some users suggesting it should be disabled by default to avoid unnecessary RAM usage. Experts discussed its limited impact on performance, especially for larger models.
    • Users shared experiences, noting that disabling this feature provided a better balance between system resources and model performance.
  • Interest in Audio Transcription Capabilities: Users expressed a desire to automate audio transcription but noted the lack of direct support for audio inputs in LM Studio. Alternatives such as APIs and open-source TTS/STT solutions were discussed for those prioritizing privacy.
    • Some members reported success using specific APIs, while others preferred local solutions to ensure data confidentiality.
  • Exploring Multi-GPU Configurations: Users sought advice on managing multiple GPUs with ComfyUI, exploring scripts to effectively allocate GPU resources. One user proposed a launcher to streamline setting CUDA devices without modifying configuration files.
    • The discussion included suggestions for existing scripts available on GitHub that could simplify multi-GPU setups.
  • Concerns Over Phi-3 Model Support: Concerns were raised regarding the lack of Phi-3 model support in llama.cpp, which affects interface compatibility such as in Oobabooga WebUI. This sparked a broader conversation on recent updates and community reactions.
    • Members noted that the issue might require coordination among developers to ensure seamless integration with the latest models.


CUDA MODE Discord

  • Gameboy Emulator Simplifies RL: A detailed setup for a Gameboy emulator can be found in the PufferLib GitHub repository, streamlining reinforcement learning in game environments.
    • This approach allows users to explore RL concepts without the need for extensive speed optimizations.
  • PyTorch 2.4 Struggles on CUDA 12.4: Users reported issues with PyTorch 2.4 on CUDA 12.4, noting a drop in performance compared to earlier versions like CUDA 12.1.
    • Concerns were raised over compatibility and potential improvements when reverting to previous CUDA versions.
  • ZLUDA 3 Yanked Post AMD Claims: The author has taken down ZLUDA 3 following AMD's claim that the permission for its release was invalid, detailed in the GitHub page.
    • This situation has stirred discussions about AMD's role in the development landscape and the implications for open-source contributions.
  • Debate on INT8 Quantization Techniques: Discussions around INT8 symmetric quantization revealed concerns about bias in weight updates when using a scale of 127.5 during training.
    • Members debated the efficacy of full vs restricted range quantization, emphasizing potential challenges in model integrity.
  • Introducing SARATHI for LLM Efficiency: A new framework, SARATHI, addresses inefficiencies in LLM inference by employing chunked-prefills and improved batching strategies.
    • This approach aims to enhance GPU utilization while reducing imbalances in pipeline parallelism during model inference.


Nous Research AI Discord

  • UltraSteer-V0 Dataset Launch: Nvidia introduced the UltraSteer-V0 dataset, featuring 2.3M conversations with 2.8M turns and labeled across 9 fine-grained signals using the Llama2-13B-SteerLM-RM reward model.
    • Despite being a version zero, it has unique thread continuations thanks to extensive deduplication over 22 days and is available for access on Hugging Face.
  • Challenges in Fine-tuning Insurance Models: A user queried about experiences fine-tuning models for the insurance sector, highlighting challenges specific to this industry.
    • This discussion drew input on necessary adaptations and considerations for applying AI effectively in insurance contexts.
  • Buzz Around Flux AI's Abilities: Flux AI showcased skills in text comprehension, prompt comprehension, and image generation, sparking excitement among members.
    • Many users praised its capabilities, with some already leveraging its Pro version for enhanced performance.
  • Open Medical Reasoning Tasks Initiative: Collaboratively led by Open Life-Science AI, the Open Medical Reasoning Tasks project seeks to compile a robust list of tasks for LLMs in healthcare, inviting contributions from various stakeholders.
    • A member celebrated this collaborative effort, emphasizing the collective impact on advancing AI in the medical field; more details are available on GitHub.
  • MiniCPM-Llama3-V Model Updates: Members discussed the latest updates on MiniCPM-Llama3-V, which claims improved capabilities for handling multiple image inputs and OCR tasks.
    • This sparked initial skepticism, but excitement grew with new examples demonstrating its application and effectiveness.


Latent Space Discord

  • Web Devs Transition to AI Engineering: Discussions highlighted the growing transition of web developers into AI engineering due to high demand and limited ML engineers, with participants sharing insights on adapting skill sets.
    • Members emphasized how web devs are often expected to implement AI projects alongside traditional development duties.
  • OpenAI Faces Leadership Shifts: A wave of leadership changes at OpenAI has raised concerns about the company's future trajectory and stability, leading to a vibrant debate in the community.
    • Participants speculated on the potential implications of these departures on the overall direction of OpenAI.
  • Generative AI Revolutionizes Retail: Generative AI applications are thriving in the retail sector, especially in crafting product descriptions across platforms, with examples stemming from L'Oreal.
    • Discussions raised crucial points about evaluating the effectiveness of AI-generated content and the need for better performance metrics.
  • Structured Outputs Feature Debuts in GPT-4o: OpenAI has launched a structured outputs feature in GPT-4o, allowing models to adhere to JSON schemas with improved reliability compared to previous models.
    • Community members recognized this advancement as a significant step toward generating more controlled and structured data outputs in AI.
  • Skepticism in Energy-Based Language Modeling: An anecdote about a meet-up with an Extropic AI researcher highlighted a skepticism towards their knowledge in energy-based language modeling, questioning their credibility.
    • This exchange stirred a broader discussion about the expertise of newer startups in complex AI domains.


OpenAI Discord

  • OpenAI DevDay Goes Global!: OpenAI is taking DevDay on the road this fall with events in San Francisco, London, and Singapore, featuring hands-on sessions and best practices for developers. Participants can engage directly with OpenAI engineers to see innovations in action, details can be found here.
    • The event promises a platform for developers to connect globally, sharing insights and redefining practices in AI development.
  • DALL-E 3 Model Shows Results Variability: Members discussed the DALL-E 3 model and the variability in generated results, highlighting comparisons with Llama models and the influence of safety filters. Notably, output quality discrepancies were attributed to the safety measures implemented by OpenAI.
    • The community is analyzing these variances while exploring the nuances of AI generation quality and safety concerns.
  • Search GPT is Available Now!: Search GPT has officially rolled out, generating interest among users regarding its functionalities and applications. Members are actively discussing how they plan to leverage this new feature in their workflows.
    • This roll-out has prompted questions about user experiences and practical implementations of Search GPT.
  • Excitement for Generative AI in Gaming: Members are thrilled about the potential of generative AI in enhancing gaming experiences, specifically in titles like BG3 and Pathfinder. They envision dynamic NPC interactions stemming from improved AI capabilities.
    • The discussion centered around creating immersive environments where character designs and player choices blend seamlessly.
  • ChatGPT-4o's Updates Spark Questions: Users noted significant changes in the performance of ChatGPT-4o, speculating that it has undergone recent updates. Members are discussing the implications of these changes on output consistency and user experience.
    • Observations about the version gpt-4o-2024-08-06 have spurred further conversation on what these updates mean for developers and users moving forward.


Perplexity AI Discord

  • Perplexity AI Technical Issues Surface: Users reported various technical issues with the Perplexity Pro app, including inability to switch LLMs and missing libraries, triggering significant concern over functionality.
    • Some features returned unexpectedly, indicating potential intermittent issues rather than systemic failures.
  • NVIDIA's Blackwell GPUs Hit Delays: NVIDIA's Blackwell GPUs have been delayed due to critical design flaws and issues with CoWoS-L packaging technology, necessitating a redesign of the processor die.
    • These setbacks are pushing back production timelines, impacting expectations for the next generation of GPUs.
  • Language Model Comparisons Heat Up: Debates erupted over performance comparisons between GPT-4o and Turbo, with users expressing mixed experiences, particularly around responsiveness and effectiveness.
    • Some users noted GPT-4o struggled with new instructions, garnering calls for a reassessment of LLM capabilities.
  • Exploring Content Recommendation Engines: A new university project aimed at developing a content sorting and recommendation engine caught interest, emphasizing the need for user input in creating a relevant sorting algorithm.
    • Members suggested leveraging RAG (retrieval-augmented generation) principles to enhance the project’s effectiveness.
  • API Functionality Under Scrutiny: Concerns were raised about API discrepancies, with users experiencing corrupted data returns leading to doubts about the API's reliability.
    • Additionally, upcoming deprecation of all Perplexity API models by August 12, 2024 brought attention to required adjustments for future usage.


Eleuther Discord

  • Novel methods in Mechanistic Anomaly Detection: The team examined mechanistic methods for anomaly detection in language models using Neel Nanda's attribution patching technique, but traditional baselines based on activations performed better.
    • They found improved performance by evaluating entire batches rather than individual points, varying success across tasks.
  • Debate on SB1047 AI Safety Act heats up: Members held a vigorous discussion about SB1047, with concerns that it may stifle innovation while others argue for necessary accountability in AI research.
    • Debaters expressed that the bill's liability provisions could deter open research efforts, indicating a need to balance regulation with innovation.
  • Meta's advancements in distributed AI training: At ACM SIGCOMM 2024, Meta showcased their paper on RDMA over Ethernet for Distributed AI Training, focusing on support infrastructure for training models like LLAMA 3.1 405B.
    • This presentation underscored the increasing demands in communication spurred by large-scale AI applications.
  • Recap on Sparse Autoencoder (SAE) developments: Members referenced a paper on SAE along with follow-up research on scaling SAEs to stay updated on SAE advancements.
    • They discussed the relevance of SAE notation and shared resources including a Google document that tracks the landscape of these technologies.
  • lm-eval-harness insights and usage: A user inquired about utilizing lm-eval-harness for custom models and received a helpful link to a self-contained example for adapting the Huggingface model class.
    • Discussion highlighted the inclusion of special tokens like BOS and the process for extracting benchmark names from JSON output in evaluation results.


LangChain AI Discord

  • Managing GPU Memory Issues: A user reported out-of-memory errors with models like aya and nomic-embed-text, using a machine with 32GB RAM. It was suggested to switch to CPU, but that led to much slower performance.
    • This discussion highlighted the performance trade-offs engineers face when dealing with memory constraints and the challenges of optimizing GPU resources.
  • LangGraph Course Recommendations: Users discussed various LangGraph courses, recommending the DeepLearning AI course as a solid option, along with an advanced one on Udemy. There's a general sentiment that many beginner-friendly resources exist, but advanced materials are lacking.
    • This points to a need for more comprehensive training at higher levels in the LangGraph ecosystem for practitioners looking to deepen their skills.
  • Collaboration on SQL Chat Agent: One user sought assistance with developing a SQL chat agent script, sparking a collaborative effort from another experienced developer. Scripts and feedback were shared, showcasing community support.
    • This interaction exemplifies the collaborative culture among developers, emphasizing knowledge sharing to improve AI functionalities.
  • New Music Discovery App Launch: The mood2music app was introduced, promising AI-driven music recommendations based on user moods. It is currently building a waitlist and features unique music curation capabilities.
    • The application generates excitement as it prepares for launch, identifying potential engagement with music enthusiasts.
  • AgentGenesis Boosts AI Development: A member shared AgentGenesis, a library providing copy-paste code snippets for accelerating Gen AI applications. It aims to offer a developer-friendly code library that enhances productivity dramatically.
    • The project invites community contributions and aims to simplify development processes, showcasing the collaborative spirit in the AI developer community.


Interconnects (Nathan Lambert) Discord

  • John Schulman Leaves OpenAI for Anthropic: John Schulman announced his departure from OpenAI to focus on AI alignment research at Anthropic, stating a desire for hands-on technical work.
    • He emphasized this choice is personal, noting it reflects ongoing support for alignment at OpenAI despite his exit.
  • GDB's Sabbatical Sparks Speculation: GDB's decision to take a sabbatical until year-end has led to discussions questioning the reasoning, with concerns about overwork and health issues.
    • Some speculate this break could be essential after intense years focused on AGI development.
  • Debate Rages on AI Alignment Perspectives: A robust discussion unfolded about differing views on AI alignment, with Schulman favoring a reinforcement learning approach while others argue it transcends traditional methods.
    • This reflects broader concerns on controlling superhuman AI and whether alignment is fundamentally a deep learning problem.
  • Structured Outputs Revolutionize API Handling: The recent introduction of Structured Outputs allows developers consistent schema matches without missing keys.
    • Additionally, developers save 50% on input costs and 33% on output costs by switching to the gpt-4o-2024-08-06 model.
  • DALL·E Faces Growing Competition: Discussion arose whether DALL·E still holds the title for best image generation as new rivals come into play, with challenges in making outright comparisons.
    • Members noted the importance of context over intuition when evaluating competitive capabilities.


OpenRouter (Alex Atallah) Discord

  • GPT-4o-2024-08-06 Now Live: The release of GPT-4o-2024-08-06 marks a notable update with significantly reduced pricing of 50% for inputs and 33% for outputs, further enhancing developer accessibility.
    • Notably, the model includes a new 'refusal' field feature, sparking excitement for improved functionality.
  • Gemini Pro 1.5 Suffers Resource Limitation: Users faced an error stating 'Resource has been exhausted' with Gemini Pro 1.5, linked to stringent rate limits enforced by Google.
    • Unfortunately, there is currently no remedy as this is a restriction coming directly from Google.
  • Navigating OpenRouter's API: Inquiries regarding model purchases led to the understanding that models via the OpenRouter require payment per token usage, with new users encouraged to try interfaces like Lobe Chat for easier interactions.
    • This approach is intended to streamline access as well as decrease friction for onboarding users.
  • Structured Outputs Boost API Reliability: OpenAI introduced structured outputs allowing developers to request valid JSON responses directly from the API, enhancing overall reliability and usability.
    • This initiative addresses prior inconsistencies in output formats, aiming for a more standardized interaction across applications.
  • Model Pricing Fluctuations Under Review: Discussions around the token limit discrepancies for gpt-4o-2024-08-06 surfaced, with the OpenRouter interface showing a lower maximum than OpenAI's documentation.
    • Users await updates to align system capabilities accurately with the latest model specifications.


LlamaIndex Discord

  • Join the CodiumAI Webinar on RAG-Enhanced Coding: A reminder was shared about the upcoming webinar with CodiumAI focusing on RAG-augmented coding assistants. Participants must verify token ownership through their wallet to access the event.
    • The webinar will cover how Retrieval-Augmented Generation (RAG) improves contextual awareness in AI-generated code, which is critical for maintaining high quality in software development.
  • Building Multi-Agent Systems Using RabbitMQ: A blog highlights how to create a local multi-agent system with RabbitMQ, utilizing tools like ollama and qdrant_engine through llama-agents. Check out the complete guide here.
    • This set-up facilitates communication between agents and enhances the development experience essential for building robust AI systems.
  • Using HuggingFace Inference API for Embeddings: The HuggingFace Inference API enables embedding generation using the TextEmbeddingsInference class, as detailed in this example. It supports parameters like model name and embedding batch size to optimize performance.
    • Users highlighted the efficiency it brings to processing embeddings, essential for training AI models.
  • RAG Performance Insights Shared: Discussion included insights into how Retrieval-Augmented Generation enhances the quality of generated code based on contextual awareness. A presentation on an advanced approach using the LlamaIndex infrastructure covers practical applications.
    • Attendees can expect to learn about context-aware generation, which is critical for developers looking to improve their coding assistants.
  • Llamaparse's Arabic Parsing Issue: Users reported that Llamaparse struggles with Arabic parsing, producing results in a Left to Right format despite Arabic's Right to Left nature. This raises important questions regarding Llamaparse's handling of language intricacies.
    • This feedback signals a potential area for improvement in accommodating diverse languages in parsing applications.


Cohere Discord

  • LLM Hallucination Index Raises Eyebrows: The LLM Hallucination Index evaluates model fidelity to context, spotlighting concerns as it was named Word of the Year.
    • Members debated the index's accuracy for Command R Plus, suggesting it misrepresents its open-source status.
  • Open Source Definition Sparks Debate: There are disagreements over the open-source definition in the Hallucination Index, deemed too lenient for just releasing weights.
    • Additional transparency on datasets and training methods was emphasized as crucial for genuine open-source status.
  • Mistral's License Under the Microscope: Members clarified that Mistral models are under the Apache 2.0 license, qualifying them as open source, albeit with dataset access limitations.
    • Discussions revealed that many models are labeled as 'open weights' but lack true open-source characteristics.
  • Command R Plus's Commercial Use Controversy: Command R Plus operates under a Creative Commons Attribution Non Commercial license, rendering it effectively closed-source.
    • The paper's open-source definition drew scrutiny, with members advocating for a clearer standard.
  • Cohere Toolkit Fuels Learning Project: The Cohere Toolkit is employed for a learning initiative in an AI fellowship, focusing on building an LLM with RAG over diverse corpora like recipes and legal case notes.
    • Inquiry arose about transitioning from Cohere models to third-party APIs like OpenAI Chat GPT or Gemini 1.5, hinting at broader functional needs.


Modular (Mojo 🔥) Discord

  • InlineList Defines a New Direction: The InlineList currently lacks moveinit and copyinit functionalities, but progress is underway with key features set to merge soon.
    • Members are prioritizing these developments as essential for improving core functionalities.
  • Clarify Mojo Types: List vs. InlinedFixedVector: InlinedFixedVector is crafted for AnyTrivialRegType while List caters to CollectionElement, highlighting their tailored purposes in Mojo.
    • Discussion touched on a small buffer optimization under review that may enhance List performance.
  • Mojo and Custom Hardware: An Accelerating Topic: Members debated the potential for custom accelerators like PCIe cards with Mojo, questioning support before an open-source release.
    • Concerns about performance emphasized the reliance on cxl.mem for effective hardware integration.
  • FPGA and CXL IP Blocks: Hardware Development Insights: Discussions covered the use of Xilinx VU13P FPGAs and the integration of CXL IP blocks for hardware optimization projects.
    • One member shared plans to replace kernel usage with custom solutions to enhance overall efficiency.
  • Excitement Builds for Mojo's Open Source Future: There’s palpable excitement about Mojo's future as an open-source project, especially concerning support for RISC-V vector extensions.
    • Members expressed hopes for Mojo to significantly contribute to their projects despite current compatibility limitations.


LAION Discord

  • John Schulman exits OpenAI for Anthropic: John Schulman, co-founder of OpenAI, is joining Anthropic, an AI startup backed by Amazon, following the dissolution of OpenAI's superalignment team.
    • This shift may reflect ongoing concerns around ensuring control over advanced AI systems in the changing landscape.
  • Open-source AI struggles with high costs: The open-source AI sector faces significant challenges, particularly high training costs for state-of-the-art models and the difficulty in acquiring necessary preference data.
    • These issues contribute to a bottleneck in the development of competitive open models.
  • Meta's JASCO under scrutiny: Speculation around Meta's JASCO has spiked due to reports of it going 'missing' and possible lawsuits from Udio and Suno.
    • This rumor could stall Meta's AI advancements as uncertainty looms in the community.
  • Doxxing incident raises privacy concerns: Nullbulge experienced a doxxing incident, igniting discussions on the risks surrounding online privacy and individual reputation.
    • Community members noted potential weaknesses in operational security that might mitigate future risks.
  • Model hits accuracy wall at 270k parameters: The 270k model is reportedly encountering an accuracy plateau, achieving only 84% validation accuracy, signaling diminishing returns with increased parameters.
    • A participant suggested this trend indicates the need for alternative strategies in model design.


tinygrad (George Hotz) Discord

  • Feasibility of tinygrad on Aurora: Members debated whether it's feasible to run tinygrad on Aurora due to reliance on Intel GPUs, highlighting their support for tensor core instructions like the A770s.
    • Discussion involved expectations of Aurora's capabilities, which are projected to exceed 2 ExaFLOPS, making it potentially the fastest computer ever.
  • Preallocation Techniques for Tensors: A member suggested that preallocating tensors and assigning slices might resolve tensor manipulation issues, with George confirming contiguity resolves the problem.
    • Mapping Buffer instances back to DEFINE_GLOBAL highlighted clarity needs, as members like Eigenvector42 expressed uncertainties in the tensor flow.
  • Need for Distributed Computing Features: Members emphasized the necessity of mature distributed computing functionality for tinygrad to fully harness Aurora's capabilities.
    • They highlighted that enhancing these capabilities is crucial for better leveraging Aurora's computational power.
  • Dual Support Needed for FP8 NVIDIA Bounty: A query arose regarding whether support for E4M3 or E5M2, or both, was desired for the FP8 NVIDIA bounty, greeted by George's favorable response for both.
    • This indicates a crucial area for future development and backing for NVIDIA's requirements.
  • OpenMP Threading Insights: Discussion around CLANG and LLVM threading confirmed usage primarily on a single thread, with enhancement possibilities through OpenMP mentioned.
    • Links to respective tinygrad GitHub pull requests were shared to inspire contributions and improvements.


DSPy Discord

  • Wiseflow revolutionizes information mining: Wiseflow is a new agile information mining tool that extracts and categorizes concise messages from diverse sources, enhancing data organization.
    • This innovative tool is designed for optimal retrieval in information-heavy environments, addressing current user needs.
  • HybridAGI introduces neuro-symbolic enhancements: The latest version of HybridAGI incorporates a neuro-symbolic system centered on graphs, improving RAG (Retrieval-Augmented Generation) functionality.
    • Key features include various notebooks aimed at streamlining usability and enhancing data processing pipelines.
  • LLMs evolve towards AGI with agents: Research is underway on transitioning LLMs to LLM-based agents, addressing limitations in autonomy as highlighted in this study.
    • This underscores the necessity for unified standards to benchmark LLM solutions as agents.
  • Boosting performance with inference compute: A recent study indicates that increasing generated sample numbers during inference can raise performance, with issue resolution rates improving from 15.9% to 56% as seen in the SWE-bench Lite.
    • This relationship between sample coverage and performance is particularly beneficial for coding and formal proofs.
  • MIPRO often surpasses BootstrapFewShotWithRandomSearch: In response to queries, it was noted that MIPRO performs better than BootstrapFewShotWithRandomSearch 'often, but not necessarily'.
    • This points to MIPRO's strong performance while acknowledging variability.


OpenAccess AI Collective (axolotl) Discord

  • Synthetic Data Generation Strategy: A member inquired about strategies for synthetic data generation to enhance 8b models on reasoning tasks, particularly text to SQL using Chain of Thought (CoT) training.
    • They suggested utilizing synthetic instructions before generating SQL queries to potentially improve model performance.
  • QLoRA Configurations for Gemma 2 27b: Discussion centered around QLoRA for Gemma 2 27b, with a recommendation to adjust the learning rate for compatibility with Flash Attention.
    • Members shared intentions to experiment with these modifications which could benefit training.
  • Fine-tuning Context Length Insights: A member questioned the ability to adjust the context length of a fine-tuned model like llama2-13b-hf after setting it to 4k.
    • Another member confirmed it can be increased or decreased, recommending a stepwise approach for large adjustments to maintain performance.
  • RoPE Scaling for Quick Adjustments: In relation to the context length topic, there was a suggestion to use RoPE scaling for efficient adjustments.
    • It was advised to gradually increase context length for optimal results, particularly for significant changes.
  • BitsAndBytes GitHub Pull Request Mention: A member emphasized tracking the right branch on BitsAndBytes GitHub, referring specifically to pull request #1220.
    • This detail could be crucial for anyone involved in recent development or debugging.


Torchtune Discord

  • PPO Training Recipe Added to Torchtune: An end-to-end PPO training recipe has been integrated into Torchtune, enabling RLHF capabilities. Check out the detailed implementation here.
    • This addition streamlines integration between reinforcement learning and Torchtune's toolkit, enhancing training options.
  • Qwen2 Models Now Supported: Support for Qwen2 models, including the 7B model, has been integrated into Torchtune's training recipes with upcoming releases of 1.5B and 0.5B models soon. More details can be found here.
    • This expansion opens up more possibilities for model experimentation and tuning within the community.
  • DPO Support Planned for Llama 3: Members discussed the potential for supporting DPO with the Llama 3 8B full finetune, expressing interest in enhancements. Any of the models can be used with the recipes, even without a pre-built configuration.
    • This suggests an ongoing effort to explore deeper model capabilities.
  • Refactored PreferenceDataset Enhances Chat Support: The newly refactored PreferenceDataset now supports chat functionalities, as detailed in Pull Request #1276. This aligns with the unified message_transform pipeline established in previous discussions.
    • This update appears to significantly improve user interaction with datasets.
  • Proposal for Dedicated Model Builders Pages: A member suggested creating a dedicated page for each model's builders to accommodate the growing number of models and multimodal LLMs. This would allow us to better explain repetitive details like downloading and configuring models, consolidating information for users.
    • The proposal emphasizes the community's need for clearer organizational tools in model management.


OpenInterpreter Discord

  • Troubleshooting Open Interpreter Setup: Users report issues with setting up Open Interpreter, particularly when selecting a local Llama model, often encountering an openai.APIConnectionError during execution.
    • One user reported that their model attempted to download again even after selection.
  • Inquiry on Open Interpreter's Security Measures: A member raised concerns about how Open Interpreter handles user data, specifically whether it remains on their local machine.
    • They inquired about end-to-end encryption standards and any third-party involvement during communication.
  • Python Compatibility for Open Interpreter: A member questioned if Open Interpreter functions with Python 3.12, expressing their beginner status in programming.
    • Another member clarified that current compatibility requires Python 3.10 or 3.11.
  • Ollama Model List Command: To explore available models, a member suggested using the command ollama list, noting that each model has specific VRAM requirements.
    • Instructions to run models are detailed in the Ollama documentation, emphasizing resource availability.
  • API Keys for Remotely Hosted Models: It was established that an API key is essential for accessing paid remotely hosted models, while local models operate on a designated port.
    • This highlights the importance of authentication for remote capabilities.


Mozilla AI Discord

  • Llamafile achieves major milestones: The team continues to advance Llamafile, offering offline, accessible LLMs in a single file, much to the excitement of community members.
    • Community members expressed excitement about the project's potential impact on accessibility.
  • Mozilla AI community requests feedback for rewards: The Mozilla AI community seeks input through a survey, incentivizing participation with a chance to win a $25 gift card.
    • Members are encouraged to share how Mozilla AI can better support them via community resources.
  • Celebrate at the sqlite-vec release party: Everyone is invited to the sqlite-vec release party, featuring demos led by core maintainer.
    • Participants will have the opportunity to try demos and engage directly with the core team, enhancing their hands-on experience.
  • Engaging discussions in Machine Learning Paper Talks: Upcoming Machine Learning Paper Talks will cover Communicative Agents and Extended Mind Transformers, hosted by a prominent community member.
    • These sessions promise to engage attendees with the latest research and invigorating discussions.
  • Insights from Local AI AMA: An AMA is set with Local AI's core maintainer, discussing self-hosting alternatives.
    • This is a prime chance for members to ask questions and explore practical implementations of Local AI.


MLOps @Chipro Discord

  • LinkedIn Engineering revamps their ML platform: LinkedIn is hosting a live event detailing their engineering team's transformation of the ML platform and innovations within it. You can join the discussion here.
    • The event emphasizes insights into the latest advancements in machine learning, encouraging participants to engage and share their thoughts during the discussion.
  • Live Event brings real-time insights: Currently ongoing, the event sheds light on pivotal developments in machine learning at LinkedIn, showcasing strategies and technologies used by their engineering team.
    • Participants can contribute actively, making it a collaborative venue for those interested in state-of-the-art practices in the field.


The Alignment Lab AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The LLM Finetuning (Hamel + Dan) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The DiscoResearch Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: !

If you enjoyed AInews, please share with a friend! Thanks in advance!

Don't miss what's next. Subscribe to AI News (MOVED TO news.smol.ai!):
Share this email:
Share on Twitter Share on LinkedIn Share on Hacker News Share on Reddit Share via email
Twitter
https://latent....
Powered by Buttondown, the easiest way to start and grow your newsletter.