[AINews] not much happened today
This is AI News! an MVP of a service that goes thru all AI discords/Twitters/reddits and summarizes what people are talking about, so that you can keep up without the fatigue. Signing up here opts you in to the real thing when we launch it 🔜
another quiet day in AI.
AI News for 8/19/2024-8/20/2024. We checked 7 subreddits, 384 Twitters and 29 Discords (254 channels, and 2227 messages) for you. Estimated reading time saved (at 200wpm): 258 minutes. You can now tag @smol_ai for AINews discussions!
No main story, just little ones:
- OpenAI GA'ed GPT-4o finetuning with a notable case study on Cosine
- Anthropic GA'ed Claude 3.5 Sonnet 8k token output
- Zed introduced their Cursor/Cursor Composer competitor AI features
- The Microsoft Phi team released Phi-3.5 in 3 variants: Mini (3.8B), MoE (16x3.8B), Vision (4.2B), all remarkably sample efficient. No paper or independent evals yet.
Since it's a quiet day you can support AINews by checking out Box AI who have kindly supported this week's issues!
[Sponsored by Box] You might have an app. It might have users. Those users might even store docs in Box. But Box AI lets your users query their docs right in the Content Preview UI Element!
Swyx commentary: "Chat with PDF" is now one React component and an API key away! Note it's only available to Box Enterprise Plus customers for now.
(previously with Box AI: Week 1, Week 2)
The Table of Contents and Channel Summaries have been moved to the web version of this email: !
AI Twitter Recap
all recaps done by Claude 3.5 Sonnet, best of 4 runs.
AI Model Developments and Benchmarks
- Llama 3.1 405B Release: Meta released Llama 3.1 405B, which can now be easily deployed on Google Cloud Vertex AI. This offers GPT-4 level capabilities that can be run in-house, giving full control. @_philschmid shared details on deployment using Hugging Face's Text Generation Inference container.
- Qwen2-Math-72B: This model achieves state-of-the-art performance on several math benchmark datasets. A Gradio demo has been released for testing. @huybery highlighted its strength and provided a link to try the demo.
- Model Comparisons: Various tweets discussed comparisons between different models and architectures:
- ViT vs CNN performance comparisons were mentioned by @giffmana
- Mamba architecture performance was discussed by @wightmanr
AI Tools and Applications
- DSPy: @lateinteraction shared updates on DSPy 2.5 and 3.0, including a roadmap for future developments. The focus is on shifting from ad-hoc prompting to systematic programming.
- Flux: @awnihannun mentioned that Flux Schnell in the latest DiffusionKit with MLX is 30% faster and uses less RAM, allowing high-quality image generation in under a minute on an M1 Max laptop.
- LangChain: The LangChain community is organizing events, including a Hacky Hour in Austin. @LangChainAI shared details about the upcoming gathering.
AI Research and Techniques
- Zero-shot DUP prompting: This technique achieves SOTA results on math reasoning tasks across various LLMs. @rohanpaul_ai explained the three-stage process and its benefits in reducing semantic misunderstanding errors.
- Fine-tuning Models: @jxnlco shared insights on fine-tuning models, emphasizing the importance of data quality, avoiding vendor lock-in, and focusing on thorough evaluation.
AI Ethics and Regulation
- California AI Safety Bill SB 1047: @rohanpaul_ai summarized key points from the modified version of the bill, including changes to liability and safety practice requirements.
- AI Regulation Debate: @ylecun expressed concerns about regulating AI research and development, particularly regarding obstacles to scientific information exchange and open-source code distribution.
AI Engineering Perspectives
- AI Engineer Role: @swyx discussed the central purpose of AI Engineers as turning existing foundation model capabilities into useful products. He highlighted the divergence from traditional ML Engineering and the increasing complexity of the AI stack.
- Docker Importance: @svpino emphasized the necessity of learning Docker for building and deploying software, describing it as a main differentiator in his work.
- LLM API Businesses: @finbarrtimbers expressed confusion about the economics of LLM API businesses, sparking discussion about the sustainability and profitability of such models.
AI Reddit Recap
/r/LocalLlama Recap
Theme 1. Large Language Model Releases and Deployment
- Announcing: Magnum 123B (Score: 110, Comments: 21): Magnum-v2-123B, based on MistralAI's Large, has been released as the largest Magnum model to date, trained on the same dataset as other v2 models. The model, which was trained using 8x MI300 GPUs on RunPod, has not undergone formal evaluations but showed promising results during testing, appearing to be an improvement over previous Magnum versions.
Theme 2. Innovative AI Interfaces: Handwriting and Speech Recognition
- Using Whipser+GPT for automatic note taking and tagging (Score: 72, Comments: 12): Whisper and GPT are being utilized for automatic note-taking and tagging in Obsidian, as described by the post author. The combination of these AI models enables efficient conversion of audio to text and subsequent organization of notes, potentially streamlining the process of capturing and categorizing information within the Obsidian note-taking system.
- The author shared links to their GitHub repositories for AlwaysReddy and alwaysreddy_add_to_md_note, which handle transcription and note-taking functionality.
- Obsidian users discussed note-saving options, including daily notes and static notes. One user mentioned integrating Obsidian notes with a pipeline in Open WebUI.
- The system uses an LLM (such as Claude) for automatic tagging, and can work with any LLM, including local model servers.
- handwriting interface on the e-reader. slowly turning it into what I always dreamed a palm pilot would be. ultimately I'd like to have it recognize shapes - but I'm not sure what cheap models can do that (~0.5B size) (Score: 249, Comments: 29): The post discusses developing a handwriting interface for an e-reader, aiming to create a device reminiscent of an advanced Palm Pilot. The author expresses interest in implementing shape recognition functionality but is uncertain about the capabilities of smaller, more affordable language models around 0.5 billion parameters in size for this task.
- The project uses qwen2:0.5b on ollama with bun as server and handwriting.js on frontend, running on a Boox Palma device. Users suggested potentially upgrading to gemma2B or phi-3-mini models, with discussions on token generation speeds on various devices.
- Debate arose over the practicality of a handwriting interface for LLMs, with some arguing it contradicts LLM benefits. Others defended the concept as an innovative integration of open weights with different input types, suggesting potential uses like transforming brief handwritten notes into more fluent text.
- Users drew parallels between the project and fictional magical objects, particularly Tom Riddle's diary from Harry Potter. There was also criticism of Boox as a company, with calls for competitors that respect open-source licenses and produce more durable devices.
All AI Reddit Recap
r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity
AI Image Generation Advancements
- Flux model demonstrates versatile image generation capabilities:
- Flux model's strengths and limitations:
AI Industry Developments
- AMD challenges Nvidia's AI infrastructure lead: AMD signs $4.9 billion deal to compete in the AI hardware market.
AI Ethics and Philosophy Discussions
- Debates on AI consciousness and intelligence:
Memes and Humor
- Meme about AI debate
- "It's not really thinking, it's just sparkling reasoning" meme
- AI rights movement parody video
AI Discord Recap
A summary of Summaries of Summaries by Claude 3.5 Sonnet
1. LLM Advancements and Benchmarking
- Hermes 3 Takes on the Giants: Hermes 3, a 70B parameter model, has been released on OpenRouter with advanced agentic capabilities and improved roleplaying abilities.
- Users are eager to compare Hermes 3 performance against models like Meta-Llama 405b, though it's not yet listed on the LLM Arena leaderboard.
- LLaMA 3.1 Struggles with SQL: A user reported that LLaMA 3.1 70B is unable to query a database using LangChain's SQL agent, while GPT 3.5 succeeds with the same setup.
- Despite attempts with custom parsers, the issue persists, leading to speculation about LLaMA's limitations in certain tasks compared to other models.
2. Model Performance Optimization
- Torch.compile Recompilation Challenges: Users discussed issues with torch.compile recompilations occurring due to input shape changes during generation and when switching between training and inference modes.
- The discussions highlighted limitations in torch.compile's ability to handle dynamic scenarios, such as passing RNG generator objects, which cause graph breaks.
- Custom Masks and KV-Cache Compatibility: Developers explored the compatibility of custom masks with kv-cache in language models, noting that direct use might not be compatible.
- A potential solution involves utilizing a custom mask and removing
self.causal_mask, though this requires further investigation and testing.
- A potential solution involves utilizing a custom mask and removing
- AI Chip Design for Local Memory: Discussion centered on how AI chips are designed with substantial local memory to fit models in cache, reducing the penalty of frequent data transfers to RAM.
- The trade-offs between Network on Chip (NoC) designs and cache management were debated, noting that while NoCs provide efficient data transfer across cores, they also introduce latency.
3. Open-Source AI Developments
- Whisperfile Simplifies Audio Transcription: Whisperfile, created by Justine Tunney, offers an easy way to transcribe audio locally using OpenAI's Whisper model, with 100% local operation and translation capabilities.
- The tool can even translate non-English audio to English during transcription, making it a versatile solution for audio processing tasks.
- LlamaIndex Expands Learning Resources: LlamaIndex launched an O'Reilly Media course on retrieval-augmented generation (RAG), covering components, evaluation, ingestion pipeline, observability, agents, and multi-modality.
- Additionally, LlamaIndex is hosting an AI product meetup, "LLMs in Production", focusing on building context-augmented LLMs with RAG & Vector DB and high-performance inference for production-ready LLMs.
- Aider v0.51.0 Enhances Development Workflow: Aider v0.51.0 was released with improved prompt caching for Anthropic models, optimized repo mapping for larger repositories, and enhanced Jupyter Notebook .ipynb file editing.
- The release includes various bug fixes and improvements, with Aider contributing 56% of the code for this version, showcasing the tool's capability in AI-assisted development.
4. Multimodal AI and Vision Models
- LM Studio's Vision Model Limitations: Users inquired about LM Studio's capability to process photos or videos as input for providing visual context in coding tasks.
- It was confirmed that local models in LM Studio cannot handle such tasks, with only cloud-based models like GPT4o and Claude currently offering this functionality.
- Qdrant 1.10 Boosts Multi-Vector Representations: Qdrant 1.10 introduced support for multi-vector representations, enhancing retrieval quality and enabling late interaction models like ColBERT.
- The update allows for adapting regular dense embedding models for late interaction by removing the pooling step and using token-level embeddings for retrieval and reranking.
5. LLM Training and Fine-tuning Techniques
- MiniPile: A Compact Alternative for Model Training: The MiniPile dataset, a 6GB subset of the Pile corpus, was recommended as a viable alternative for training smaller-scale models due to the large size and cost of the full Pile dataset.
- MiniPile was curated by filtering out low-quality clusters, ensuring a diverse pre-training dataset that is more manageable for academic budgets and smaller-scale experiments.
- Model Merging and Extension Strategies: Discussions arose around novel model merging tactics, such as applying the difference between UltraChat and base Mistral to Mistral-Yarn, sparking debates on the potential of "cursed model merging" techniques.
- Users also explored options for extending models like Mistral beyond their initial token limits, suggesting further work on mergekit and frankenMoE finetuning as potential solutions.
PART 1: High level Discord summaries
Unsloth AI (Daniel Han) Discord
- Unsloth's Limitations on Fine-tuning Llama-3.1-405B: A user inquired about fine-tuning Llama-3.1-405B on a Hugging Face Space GPU with an H100, but was informed that Unsloth currently does not support this due to the model's high memory requirements.
- The user was told that they would need at least 360 GB of GPU memory and eight H100 GPUs, which Unsloth does not offer at this time.
- Lambda's Free Model Access and Fine-tuning Limitations: A user asked if Lambda offers free fine-tuning for Llama-3.1-405B.
- They were informed that Lambda only offers free model execution and does not offer free fine-tuning, but similar features are available on platforms like Hugging Face, Meta, and Groq.
- Training Loss Issues and Troubleshooting on Google Colab: A user faced challenges in keeping their training loss below 1.000 while fine-tuning a model on a Google Colab A100 runtime.
- They experimented with adjusting the learning rate and batch size, but ultimately concluded that a Colab A100 runtime might not be a feasible long-term solution due to its high GPU memory requirements.
- Unsloth Premium and Partnerships: A user inquired about the pricing of Unsloth Premium and potential partnerships with Unsloth.
- They were informed that Unsloth Premium is not available for direct purchase, and its faster versions are restricted to Fortune 500 companies. Users were advised to contact Mike or Daniel for further information.
- PPL as a Metric for Model Evaluation: PPL (perplexity) is a useful metric for comparing the effects of quantization but can be misleading if the difference between the base and quantized model is significant.
- PPL is also valuable for comparing models at the token-level to identify observed topics, but the absolute value is meaningless, and the delta between models is the key focus.
CUDA MODE Discord
- Llama2 Model Loading Issue: A user reported that running Llama2 eval crashes during the model loading phase, simply printing 'killed' and exiting.
- The user also encountered an out-of-memory (OOM) error while running Llama2 evaluation, even though their system should have enough RAM and GPU memory.
- GPT-Fast & HF_eval Script Showdown: The discussion centered around the use of different evaluation scripts, particularly comparing the GPT-Fast evaluation script with HF_eval.
- The user reported that they encountered an issue while running the HF_eval script for evaluating Llama2, resulting in an error message indicating an unsupported default value for the
zero_point_domainparameter.
- The user reported that they encountered an issue while running the HF_eval script for evaluating Llama2, resulting in an error message indicating an unsupported default value for the
- Triton Kernel Optimization for Beginners: A user encountered a
ValueErrorwhile attempting to usetl.arangewith a non-constexpr valueseqlenin atriton.jitkernel.- The issue arose because
seqlenwas not declared as atl.constexprtype, which is required for thetl.arangefunction in Triton, highlighting a key difference between Triton and regular Python code.
- The issue arose because
- FP16 & FP8 for Comfy: A member was under the impression that Comfy supports FP16 accumulator by default, but it requires a custom Torch C++ extension.
- Comfy's FP8 implementation doesn't actually use FP8 matmul for computation; it only uses it as an intermediate data type, with Stable-fast being an alternative that doesn't support Flux but has interesting optimization ideas.
- Diffusion Model Quantization Techniques: A member discussed how diffusion models can be effectively quantized by keeping self-attention and accumulation in FP16.
- Oneflow/Onediff is a wrapper for diffusion models that uses Oneflow for inference and graph creation, but it's not compatible with Flux because Flux is too large.
Nous Research AI Discord
- Hermes 3 Compared to Meta-Llama: A member inquired about a comparison between Hermes 3/405 and other models, particularly Meta-Llama 405b, as they were unable to find Hermes on the LLM Arena leaderboard.
- Another member confirmed that Hermes 3 is benchmarked against Llama 3.1-instruct-405 in a technical report, using a suite of 15 benchmarks, but they were also looking for comparisons against Meta-Llama 405b.
- Hermes 3: Text-to-Text Model: It was confirmed that Hermes 3 is a text-to-text model, meaning that it cannot generate images.
- While you can interact with H3-405B in Discord, the bots cannot trigger image generation through commands, they can only interact by @ mentioning each other.
- Llama 3.1 Minitron 4B: Pruned Text-to-Text Model: Llama-3.1-Minitron-4B-Width-Base is a text-to-text model that can be used for various natural language generation tasks.
- It is obtained by pruning Llama-3.1-8B's embedding size, attention heads, and MLP intermediate dimension, followed by continued training with distillation using 94 billion tokens from the continuous pre-training data corpus used in Nemotron-4 15B.
- Hermes 3 Amnesia Mode: Only Available for 8B: Amnesia mode is a feature of Hermes 3 8b that can be triggered by prompting it with "Hi" with no system prompts.
- However, this mode is not available on Discord because the bot remembers all chats.
- PyDantic-XML: Serialization and Deserialization: The pydantic-xml extension allows for serializing and deserializing data between Pydantic models and XML.
- You can find the documentation for this extension at https://pydantic-xml.readthedocs.io/en/latest/.
Cohere Discord
- DeepMind OPRO Paper Question: A member inquired about the source of the information regarding an OPRO-based prompt tuner.
- The member is seeking clarification on how to implement this technique, potentially referencing the OPRO paper.
- C4AI Discord Server Invite: A member requested an invite to the C4AI Discord server.
- The member was advised to join the Cohere Discord and contact a specific user, but is unsure about the appropriate communication channel (DM or public channel).
- Cohere API
response_formatIssue: A member encountered an error while using theresponse_formatparameter in the Cohere API.- They are seeking guidance on how to properly utilize the
response_formatparameter in their API requests.
- They are seeking guidance on how to properly utilize the
- Cohere Classify Endpoint Sunset: A member inquired about potential alternatives to the Cohere Classify endpoint.
- The member is seeking recommendations for similar classification services with a focus on functionality and usability.
- Reranker API Efficiency for Large Datasets: A member asked if chunking large datasets and running the Reranker API independently on each chunk would produce accurate overall relevancy scores.
- This member is exploring the potential limitations and benefits of applying the Reranker API to large datasets in a chunked manner.
OpenRouter (Alex Atallah) Discord
- Hermes 3 Released: Hermes 3, a 70B parameter model, has been released on OpenRouter with advanced agentic capabilities and much better roleplaying.
- The release announcement also included a copyright notice for OpenRouter, LLC, stating © 2023 - 2024 OpenRouter, LLC.
- GPT Function Calls Still Supported?: A user asked if GPT functions are still supported on OpenRouter, as they are receiving 'function_call=None' even though the stop reason is 'functioncall'.
- The OpenRouter team confirmed that better tool call routing is coming soon, but currently, results may vary unless using OpenAI, Anthropic, or Google models.
- Mistral Large Instruct 2407 for German Pretraining: A user inquired about a model with good German pretraining, and was suggested to try Mistral-Large-Instruct-2407, which is multi-lingual by design and supports German.
- The user tested the model but found it to be 'okay' but not great, and further suggested checking Hugging Face for other models.
- OpenRouter Errors With Non-Free Models: Users reported encountering an error when trying to access non-free models on OpenRouter, specifically getting a 'client-side exception' and needing to hard refresh the browser.
- The OpenRouter team investigated and determined that the issue was related to access token expiration and potentially CORS errors, and ultimately resolved the issue.
- Uncensored Models on OpenRouter?: A user inquired about uncensored models on OpenRouter, and was suggested that 'open source' and 'roleplay' tags are good indicators for models that may produce NSFW content.
- Popular options for uncensored models include Dolphin, Stheno, Euryale, and MythoMax.
LM Studio Discord
- Uncensored Models: Explore the Landscape: A user sought suggestions for uncensored LLM models for non-coding tasks, and was provided with a link to llm.extractum.io which highlights its focus on uncensored LLMs for a variety of uses like legal analysis, medical research, and creative writing.
- LM Studio Server Struggles with Llama 3.1: A user reported encountering issues with LM Studio's local inference server, specifically with Llama 3.1, where the stop pattern was ignored.
- The user noted that the issue was absent in chat mode and suggested a discussion in the relevant channel to troubleshoot further.
- Speech-to-Text and Text-to-Speech in LM Studio: A user inquired about voice interaction with Llama 2/3 models in LM Studio, specifically whether speech-to-text and text-to-speech functionalities were integrated.
- It was clarified that LM Studio currently lacks this support, prompting the user to explore external solutions like Parler-TTS for text-to-speech and Whisper.cpp for speech-to-text.
- Vision Models in LM Studio: A Cloud-Based Affair: A user inquired about models capable of processing photos or videos as input in LM Studio to provide visual context for coding tasks.
- It was confirmed that local models in LM Studio cannot handle this; only cloud-based models like GPT4o and Claude offer this functionality.
- M2 Ultra: High Hopes for AI Performance: A user expressed excitement for the upcoming M2 Ultra, noting its performance is highly anticipated for AI tasks.
Eleuther Discord
- GPT-4 Neuron Explanations Debunked?: A member questioned the usefulness of GPT-4's neuron explanations, citing a paper that claimed they were not better than baselines.
- Another member provided a link to a paper titled "Language Models can explain neurons in language models" but couldn't find a paper with a similar title claiming GPT-4 explanations were not useful, despite the content being similar.
- Training Models on Limited Data - Beware the Nonsense!: Training a model on a single, small file can result in nonsensical outputs due to the influence of random initialization.
- A member compared it to text compression benchmarks, where models are trained to memorize a specific block of text, and emphasized the importance of diverse pre-training data.
- MiniPile Dataset for Efficient Training: MiniPile, a 6GB subset of the Pile corpus, was recommended as a viable alternative for training smaller-scale models due to the large size and cost of the full Pile dataset.
- MiniPile was curated by filtering out low-quality clusters, ensuring a diverse pre-training dataset that is more manageable for academic budgets.
- Frankenmerging - Composing Layers from Different Models: A member inquired about the feasibility of composing layers from two different models, a technique known as 'frankenmerging.'
- They expressed confusion about the potential risks of this approach, questioning whether it wouldn't lead to a garbled internal representation of the model, and sought clarification on potential benefits and challenges.
- Model Merging with Optimizers: A member suggested using an optimizer to find the best permutation of channels between layers of two different models before stacking them together.
- They acknowledged the potential challenges, noting that such methods haven't been demonstrated for large GPT models.
Perplexity AI Discord
- Perplexity Pro Discord Access is Confusing: Users are unable to join the Perplexity Pro Discord server, even after leaving and rejoining using the link in their Perplexity settings.
- The issue seems to be a lack of clear instructions regarding accessing the Pro section within the main Discord server.
- Perplexity's Search Function Needs Fixing: Users are reporting issues with Perplexity's search function, including the inability to access online sources and the use of outdated information.
- Some users believe this is a backend issue, but the team at Perplexity has yet to acknowledge or address the problem.
- Perplexity Pro Models Face Limitations: Users are discussing the limitations of Perplexity Pro models for tasks like coding and blog post creation.
- Some users are finding that Perplexity Pro is not as effective as other models for certain tasks, particularly when it comes to generating complex code or avoiding hallucinations in blog posts.
- Perplexity's Prioritization of Front-End vs Backend: There is a debate about whether Perplexity is prioritizing front-end development over backend development, with some users reporting issues with backend features like search and model selection.
- Some users believe that these issues indicate a lack of focus on core backend functionalities, which are critical for the overall performance of the platform.
- Perplexity Pro Feature Upgrade Discussion: A discussion occurred about upgrading to Perplexity Pro which offers features like image upload, smarter AI, and more Pro Search.
- Other users also discussed the potential benefits of using LMSYS Arena and the upcoming G1 Humanoid Robot which is reportedly ready for mass production.
LlamaIndex Discord
- LlamaIndex: Building Natural Language Querying Systems: Learn how to build a natural language querying system for graph databases using LlamaIndex and Amazon Neptune!
- A comprehensive guide by @bechbd shows you how to translate natural language questions into openCypher queries and execute queries on Amazon Neptune graph.
- O'Reilly Media Course on RAG: LlamaIndex has launched an O'Reilly Media course on retrieval-augmented generation, authored by @ravithejads.
- The 2-hour course covers components of LlamaIndex, evaluation of RAG systems, the ingestion pipeline, observability, agents, multi-modality, and more.
- LlamaIndex: LLMs in Production Meetup: Join LlamaIndex for "LLMs in Production", an AI product meetup hosted by @vesslai and @pinecone in San Francisco.
- Learn from industry leaders about building context-augmented LLMs with RAG & Vector DB, custom LLMs for smarter, faster, and cheaper solutions, and high-performance inference for production-ready LLMs.
- Hierarchical Node Parser: No Chunking?: A user asked if the LlamaIndex hierarchical node parser can create hierarchies without chunking, instead using predefined nodes.
- The user wanted to keep metadata like page IDs associated with the nodes, but this was not possible with the current implementation.
- Complex Questions with LlamaIndex Retrieval: A user discussed the need for retrieval capabilities for both simple and complex questions within LlamaIndex.
- They envisioned a hierarchical approach that could recursively summarize nodes and create higher-level representations of the data for nuanced, contextual responses.
Latent Space Discord
- Jeremy Howard Dishes on Latent Space: The latest Latent Space podcast features Jeremy Howard, with discussions on Encoder-Decoder models, Fast.html, saving/updating state, fine-tuning vs RAG vs KV caching, and a new project he's working on.
- The podcast is described as 'a 5-course meal,' after co-host Swyx's playful phrase 'give us a nibble.'
- Encoder-Decoder Models Rise: The discussion emphasizes the advantages of Encoder-Decoder models, particularly for complex contexts and intricate relationships, over Encoder-only models.
- The interviewee, likely influenced by AI Paper Club calls, already had knowledge of this approach, suggesting increasing awareness within the AI community.
- Whisperfile Makes Transcription a Breeze: Whisperfile is a new tool that allows users to easily transcribe audio locally, utilizing OpenAI's Whisper model.
- Created by Justine Tunney, Whisperfile offers 100% local operation and even translates non-English audio into English during transcription.
- Claude 3.5 Sonnet Gets a Token Boost: Anthropic AI has doubled the maximum output token limit for Claude 3.5 Sonnet, expanding it from 4096 to 8192.
- This update is now available in both the Anthropic API and Vertex AI, making Claude 3.5 Sonnet easier for developers to work with.
- GPT-4 Fine-Tuning Challenges Composer: OpenAI has released GPT-4 fine-tuning, a new feature that lets users customize GPT-4's behavior and performance.
- This update could potentially compete with Cursor's Composer feature, as both offer similar approaches to customizing and using large language models.
Modular (Mojo 🔥) Discord
- Mojo & MAX Update Cadence Synchronized: Previously, Mojo and MAX had independent update cycles, but now they are synchronized.
- This means you can install MAX+mojo main or MAX+mojo nightlies, but not MAX main and mojo nightlies separately.
- Siamese Networks with Labels?: A user inquired about switching a Siamese network's output from a sigmoid to a label (e.g., "dog" or "cat").
- Another user suggested that if you want to switch to labeling, using a standard model for that task might be more efficient than trying to adapt a Siamese network.
- Using the Slice Custom Op: A user requested a code example demonstrating the use of the slice custom op (https://docs.modular.com/max/api/mojo/graph/ops/slicing/slice).
- They expressed difficulty understanding the op's arguments.
- Mojo's
Listassignment usesref: A user was surprised to find no__setitem__method for assignment in Mojo'sListimplementation, but was informed that__getitem__returns aref[lifetime] Twhich behaves like__setitem__.- This is how you assign items to a Mojo
List.
- This is how you assign items to a Mojo
- Mojo's
refand__lifetime_ofFunctions: Therefkeyword in function return types was introduced recently (in Mojo v244) as part of the new language features.- Mojo's
__lifetime_offunction allows you to determine the lifespan of a reference, which is useful for memory management.
- Mojo's
OpenAI Discord
- ChatGPT struggles with simple tasks: A user pointed out that ChatGPT struggles with simple tasks like counting the number of 'R's in the word 'strawberry,' implying that AI is not as advanced as some might believe.
- This sparked a discussion about the current limitations of AI and whether it is truly intelligent or simply a tool that can perform specific tasks.
- Grok2 takes a different approach: A user mentioned that Grok2 has an interesting approach to dealing with problems.
- Another user pointed out that Grok2's method involves breaking down every question and solving it step by step, which is similar to the way humans solve problems.
- AI Enthusiasm - Is it overblown?: One user expressed that the term 'AI enthusiast' has lost its meaning due to AI's current limitations.
- This sentiment arose from a discussion about ChatGPT's struggles with a simple task and Grok2's method of solving problems.
- Building a Smart Cookbook: A user sought advice on creating a 'smart cookbook' that could be trained on their favorite cookbooks and provide personalized advice.
- This user believes that such a model could be applied to any 'how-to' book and requested information about existing solutions or projects.
- Strawberry Release Speculation: A user asked about the release date of 'Strawberry,' possibly a new AI model or a feature.
- Another user responded by jokingly stating that 'Strawberry' is still in the 'unreliably sourced leak' phase and expressed skepticism about its release.
Torchtune Discord
- Torch.compile struggles with recompilations: Torch.compile recompilations occur when input shape changes, like during generation, or when switching between training and inference modes.
- This is due to changes in the
grad_mode, and could be improved by implementingtorch.compileoptimization.
- This is due to changes in the
- Torch.compile cache size limit: The
torch._dynamo hit config.cache_size_limit (8)message indicates that the cache size limit has been reached.- This suggests potential issues with torch.compile friendliness. The size of the cache may need to be increased.
- RNG objects incompatible with Torch.compile: Passing an RNG generator object into the model causes graph breaks, suggesting torch.compile currently doesn't support such objects.
- This could be a challenge, but could be addressed by potentially updating
torch.compileto handle these objects.
- This could be a challenge, but could be addressed by potentially updating
- Custom masks vs kv-cache: Custom masks may not be directly compatible with kv-cache, but using your own mask and removing
self.causal_maskmight help address the issue.- This issue is worth further investigation.
- Torchtune release date: The community is eager to know the release date for Torchtune, which is reportedly 99% ready.
- The discussion suggests that the release date is not yet confirmed.
LangChain AI Discord
- LLaMA 3.1 70B Struggles with SQL: LLaMA 3.1 70B has trouble querying a database using LangChain's SQL agent, while GPT 3.5 succeeds with the same setup.
- Despite trying custom parsers, the issue persists, indicating potential limitations in LLaMA's capabilities.
- Mistral Faces Challenges Expanding Beyond 8k: A user noted that Mistral cannot expand beyond 8k without further pretraining.
- They suggested exploring mergekit and frankenMoE finetuning to address this limitation.
- Model Merging Tactics Sparked Discussion: A user proposed merging UltraChat and base Mistral into Mistral-Yarn as a potential model merging tactic.
- While some expressed skepticism, the user remained optimistic, citing past successes in what they termed "cursed model merging".
- Open Empathic Project Seeks Assistance: A user requested support in expanding the categories within the Open Empathic project, particularly at the lower end.
- They shared a YouTube video showcasing the project's launch and a tutorial, encouraging users to contribute preferred movie scenes from YouTube videos, and provided a link to the OpenEmpathic project.
- LangChain Introduces Experimental SQLDatabaseChain: A user introduced LangChain's SQLDatabaseChain as an experimental feature designed to generate SQL queries based on user prompts.
- They provided a code example for a function utilizing this feature, outlining a prompt template for SQL query generation and handling responses from the chain.
OpenInterpreter Discord
- Ollama Integration with OpenInterpreter: A user sought guidance on integrating Ollama with OpenInterpreter on a remote machine, specifically configuring a profile YAML and starting the interpreter with the profile.
- They asked about using the correct IP address and port for their Ollama instance in OpenInterpreter's configuration to establish a connection, however, OpenInterpreter still refused to connect.
- Deepseek API: An Alternative to OpenAI and Local LLMs: A user inquired about a guide for using the Deepseek API as an alternative to OpenAI or local LLMs.
- The user expressed interest in using Deepseek as a potential solution for accessing and utilizing large language models.
- Troubleshoot Poetry and Pytorch Installation on Mac: A user reported encountering issues while installing Poetry and Pytorch 2.3.0 on a Mac, mentioning an open issue that had not been resolved.
- They sought guidance on finding a solution to this installation problem, potentially involving alternative installation methods or troubleshooting specific configuration settings.
- OpenInterpreter Update Rollout: The latest OpenInterpreter update was announced in the #O1 channel.
- No additional details were provided regarding the nature or scope of the update.
- Accessibility Roundtable Reminder: A reminder for the Accessibility Roundtable was posted in the #general channel.
- The reminder included a link to the event, suggesting that it was a virtual or online meeting.
DSPy Discord
- dspy-ai Installation Stumbles: A user noted that the
requirements.txtfile listsdspy==2.0.5but questioned if it should actually bedspy-aiinstead.- They also pointed out a potential compatibility issue with
pickle5==0.0.12which is compatible with Python versions below 3.8, whiledspy-airequires Python 3.9 or higher.
- They also pointed out a potential compatibility issue with
- Can ADAS Invent New Building Blocks?: A user asked if ADAS could invent new building blocks like function calling to an integrated system.
- They also inquired if anyone has already experimented with something similar.
- Multi-Lora Setting for DSPy Finetuning: A user suggested using a multi-lora setting for DSPy finetuning, believing it could be a valuable approach.
- No further details were provided about how this might be implemented.
- DSPy vs. Langchain/LLamaindex: Choose Your Weapon: A user asked about comparing DSPy to Langchain and LLamaindex.
- They were directed to the DSPy documentation for guidance on choosing the right tool.
- Aider v0.51.0: Prompt Caching and Repo Mapping Improvements: Aider released version 0.51.0, featuring improved prompt caching for Anthropic models, optimized repo mapping for larger repositories, and enhanced Jupyter Notebook .ipynb file editing.
- The release includes a variety of bug fixes and improvements, and Aider contributed 56% of the code for this version, as noted in the Release history.
LAION Discord
- LTXStudio Launches Five New Features: LTXStudio has released five new features for users to take their projects to the next level.
- These features are accessible and testable now, with a tweet from LTXStudio announcing the release and encouraging users to try them out: Tweet from LTX Studio (@LTXStudio).
- JPEG Encoding: An Uncertain Image Tokenization Method: A research paper proposes JPEG encoding as a viable image tokenization method, but current AR-based approaches struggle with significant information loss, resulting in low image quality.
- The paper uses a JPEG quality setting of 25, which theoretically hinders high-quality image generation from the tokens and compresses a 256*256 image to 5,000 tokens, making training and inference slower than traditional VQ-VAE.
- Questions About Image Compression Limits: The author questions the maximum compression possible for images, given the paper's use of a JPEG quality setting of 25 for tokenization.
- This raises concerns about the potential limitations of this method in achieving optimal image compression.
- Training Models on H.265 or AV1 Frames: The author suggests exploring the possibility of training models on H.265 frames, or even AV1 frames, as a potential alternative to JPEG encoding for image tokenization.
- This approach could potentially address the limitations of the current JPEG encoding method and lead to better performance.
DiscoResearch Discord
- Leo Models Go Public: A member made quantized versions of their Leo models publicly available on Hugging Face.
- They are happy to take feedback and relay messages to the users if needed, adding them to the model card if desired.
- Feedback & Updates via Model Card: The member offers to add messages to the model card for feedback or relaying information to users.
- This way, anyone can see the latest information, feedback, or updates.
Interconnects (Nathan Lambert) Discord
- Xeophon's Tweet: Xeophon posted a link to a tweet from Bilawal Sidhu about the power of interconnects in deep learning.
- The tweet highlights how interconnects are crucial for large-scale distributed training of models and that the field is continuously evolving.
- Placeholder: This is a placeholder summary to satisfy the minimum requirement of 2 summaries.
- You can replace this with a real summary if you have another topic to discuss.
The Alignment Lab AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The LLM Finetuning (Hamel + Dan) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The Mozilla AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
PART 2: Detailed by-Channel summaries and links
The full channel by channel breakdowns have been truncated for email.
If you want the full breakdown, please visit the web version of this email: !
If you enjoyed AInews, please share with a friend! Thanks in advance!