[AINews] GPT4o August + 100% Structured Outputs for All (GPT4o mini edition)

Specific errors include the requirement of the for_inference() method, which breaks compatibility with existing setups.

                August 7, 2024

            [AINews] GPT4o August + 100% Structured Outputs for All (GPT4o mini edition)

This is AI News! an MVP of a service that goes thru all AI discords/Twitters/reddits and summarizes what people are talking about, so that you can keep up without the fatigue. Signing up here opts you in to the real thing when we launch it 🔜

As we did for 4o-mini, there are 2 issues of the newsletter today run with the exact same prompts - you are reading the one with all channel summaries generated by gpt-4o-mini, the previous 4o-mini model and NOT the gpt-4o-2024-08-06 released today. See that version for full writeup and side by side comparison.

PART 1: High level Discord summaries
Stability.ai (Stable Diffusion) Discord

Harnessing LoRA for Line Art Excellence: Users applied the LINE ART STYLE LoRA to produce clean line art images from photos, emphasizing specific triggers and optimal settings for best results.

To kick things off, they suggested using the Pony base model along with ControlNet for precise image transformations.
Mastering ControlNet for Artistic Styles: ControlNet emerged as a key tool for transforming images, providing guidance from photos to varied artistic styles like line art.

Participants recommended specific ControlNet models to preserve crucial image characteristics during these transformations.
AMD GPU Woes in Machine Learning: The discontinuation of ZLUDA raised alarms among users regarding the efficacy of AMD GPUs for machine learning tasks.

Discussions highlighted the performance limitations of AMD hardware, prompting reflections on their setups and preferences.
Drama Unfolds in r/stablediffusion Community: Conversations revived the controversy surrounding the r/stablediffusion subreddit takeover, pointing fingers at clashes involving moderators and stability.ai staff.

This backstory tied into broader community dynamics and their impact on platform governance and user engagement.
Stable Diffusion Model Integration Tips: Participants shared valuable insights on effectively installing and configuring LoRA and Stable Diffusion models for optimal use.

One user provided a detailed process for incorporating LoRA models into Stable Diffusion, simplifying the approach to generation prompts.

Unsloth AI (Daniel Han) Discord

Unsloth Fine-Tuning Challenges: Users reported serious issues fine-tuning LLaMA3 models with Unsloth, noting integration problems with PPO trainers due to the recent update.

Specific errors include the requirement of the for_inference() method, which breaks compatibility with existing setups.
Insights on LLaMA3 Model Training: Discussions focused on the necessity of prompt formatting for successful training with LLaMA3, especially when using the Alpaca format.

New users found that aligning prompts with previous training configurations was crucial for optimal outputs.
Launch of BigLlama-3.1-1T-Instruct: The experimental self-merge, BigLlama-3.1-1T-Instruct, has been released, intended to enhance performance from the earlier Meta-Llama-3-120B-Instruct.

However, concerns were raised that it remains 'useless' without training on its merged weights.
Exploring Multi-GPU Support in Unsloth: Users eagerly asked about the beta release of multi-GPU support within Unsloth, which promises significant performance enhancements.

The community anticipates optimizations that will yield reduced VRAM usage and faster processing speeds.
Optimizing for Cost-effective Cloud Computing: Members sought guidance on configuring the LLaMA3 model affordably on RunPod, looking for the best balance between cost and performance.

Performance metrics were shared to assist in tuning RunPod settings for maximum efficiency on available GPU resources.

HuggingFace Discord

Google introduces Gemma 2 2B: Google has released Gemma 2 2B, a lightweight model expanding the Gemma series with 2.6B parameters, ideal for on-device use. Additional offerings include ShieldGemma for safety filtering and Gemma Scope for sparse autoencoders.

Notably, Gemma 2.6B performs efficiently on browser environments powered by WebLLM & WebGPU.
Diffusers integration for FLUX announced: The newly announced Diffusers integration for FLUX allows efficient text-to-image generation with limited resources. This integration promotes innovative usage of the new model's capabilities.

Community reactions have highlighted its potential to improve user accessibility in creating images.
Magpie Ultra dataset debuts: The magpie-ultra-v0.1 dataset has launched as the first open synthetic dataset built with Llama 3.1 405B, crafted with distilabel for advanced pipeline capabilities. Users have praised its quality for complex computational tasks.

The release is a significant step forward in providing resources for training models.
Hugging Face Datasets issues discussed: Users discussed challenges with Hugging Face Datasets, focusing on loading datasets from multiple JSON lines files. Suggestions included hard coding features and a call for better error messages and potentially new flags for load_dataset to enhance user experience.

There is a widespread demand for improved documentation to assist with these issues.
NER Annotated CVs dataset available: A dataset consisting of 5029 annotated CVs with IT skills marked using Named Entity Recognition (NER) is available on Kaggle, offering formatted JSON for NLP tools like Spacy. It allows efficient training for skill recognition.

Members discussed methods for keyword and semantic search for identifying relevant files from a large data collection.

LM Studio Discord

AnythingLLM Setup with Gemma V2: A user successfully set up AnythingLLM after resolving file access issues by loading a custom Gemma v2 model. However, performance problems are attributed to hardware limitations, especially with larger models.

This raises concerns for those working with larger datasets or models that demand more resources, emphasizing the need for adequate hardware.
Flux Outshines SDXL in Performance: The Flux model, boasting 12 billion parameters, significantly outperforms 2.6 billion parameter SDXL, leading to heightened interest in testing Flux. The previous Stability AI team moved to Black Forest Labs, contributing to Flux's advancements.

Users are eager to benchmark Flux against other models, anticipating substantial performance improvements in their projects.
Navigating TTS and STT Integrations: Users explored integrating TTS and STT within LM Studio, emphasizing the need to navigate tutorials and cloud privacy issues. Some shared that merging LM Studio with APIs can enable local speech-to-text functionalities.

The community expressed growing interest in seamless TTS/STT implementations, citing potential improvements in user experience and functionality.
Speculation on Phi-3 Model Support: Participants questioned why Phi-3 models aren't supported in llama.cpp and noted failures to load them in Oobabooga webui post-update. These changes raise concerns about impacts on ongoing AI projects and model availability.

Members are anxious for updates on compatibility, stressing the importance of access to diverse models for their AI experiments.
8700G/780m IGP Performs Decently: Testing on the 8700G/780m IGP yielded around 25% CPU acceleration with Ollama and 15% with LM Studio. However, LM Studio restricts GPU RAM to 20GB, causing loading failures for larger models.

This limitation underscores the need for more robust hardware solutions in developing and testing AI applications.

CUDA MODE Discord

PufferLib Gameboy Emulator Setup Explained: An example of setting up a Gameboy emulator in PufferLib was shared to simplify reinforcement learning.

The aim is to streamline complex game environments for better model training efficiency.
PyTorch 2.4 Shows Poor Performance with CUDA 12.4: Users reported that PyTorch 2.4 struggles with CUDA 12.4, but functions well with CUDA 12.1, raising compatibility concerns.

One user noted they’re running CUDA 12.6 on their system via Conda, hinting at version-related issues.
Hudson River Trading Internship Announcement: Internships at Hudson River Trading are opening, focusing on GPU research projects, with applications expected soon.

Members expressed interest in GPU job roles, emphasizing excitement around performance compute workloads.
ZLUDA Version 3 Removed Amid AMD Dispute: The author of ZLUDA has taken down version 3 following claims from AMD about invalid permissions surrounding its release, stirring discussions on GitHub.

Members humorously referenced legal concerns with phrases like 'email not legally binding' in the context of this controversy.
Ragged Attention Masks Essential for Training: Concerns were raised about ragged attention masks needing proper handling to avoid sampling errors during training.

There was agreement on the critical importance of mask shapes for effective training, especially on complex sequences.

Nous Research AI Discord

UltraSteer-V0 Dataset Breakdown: UltraSteer-V0 is a massive dataset comprising 2.3M conversations and 2.8M turns, featuring 9 fine-grained signals developed using Nvidia's Llama2-13B-SteerLM-RM reward model.

This initial version's de-duplication process ensures unique assistant messages across dialogues, though it needs further enhancements for the UltraSteer dataset.
Open Medical Reasoning Tasks Initiative: The Open Medical Reasoning Tasks initiative aims to compile medical reasoning tasks for LLMs, encouraging contributions from professionals via GitHub.

This is AMAZING! Members commended the collaborative nature of the project, highlighting its potential in advancing AI applications in healthcare.
Model Training Issues and Solutions: Members identified challenges in model training, including catastrophic forgetting and overfitting, especially with various datasets and learning rates.

One participant expressed frustration with extremely small learning rates, noting their detrimental effect on performance across diverse datasets.
Insurance Sector Fine-tuning Queries: A member sought feedback on fine-tuning models specifically for the insurance sector, indicating a rising interest in specialized model applications.

This highlights a need for sharing techniques and experiences relevant to niche markets within the AI community.
New Model Releases and Their Capabilities: Among the latest is MiniCPM-Llama3-V-2.5 available on Hugging Face, recognized for its handling of multimodal tasks, including interactions with multiple images.

The community discussed capabilities like GPU utilization in models available on Hugging Face, emphasizing ongoing developments in their features.

Latent Space Discord

Web Developers Transition to AI Engineering: Members noted the growing transition of web developers into AI engineering roles, driven by high demand for AI expertise and a lack of qualified ML engineers.

One member highlighted that skills like API integrations offer web developers a solid foundation for these new opportunities.
OpenAI Faces Leadership Shakeup: Concerns arose over several key departures at OpenAI, leading to speculations about the company’s future stability and team morale.

The mood turned skeptical regarding OpenAI's direction, with light-hearted commentary on the leadership changes within the organization.
Generative AI Powers Retail Innovations: A member discussed how L'Oreal employs generative AI to enhance product descriptions and marketing strategies, showcasing practical applications in retail.

This leads to a critical conversation on measuring the success of AI-generated content in retail sectors.
Structured Outputs Transform GPT-4o: OpenAI rolled out a new feature in GPT-4o for structured outputs, promising to enhance adherence to developer-supplied JSON schemas from 86% to 100%.

As noted in a tweet by Michelle Pokrass, this update marks a significant improvement in handling complex data.
Energy-Based Language Modeling Under Scrutiny: Members shared a humorous story about an Extropic AI engineer who lacked familiarity with critical concepts in energy-based language modeling.

This anecdote sparked broader discussions regarding the awareness of AI concepts within various organizational teams.

OpenAI Discord

OpenAI DevDay Hits Multiple Cities: OpenAI is taking DevDay on the road this fall, with events in San Francisco, London, and Singapore featuring hands-on sessions and demos.

Engineers will showcase how developers worldwide leverage OpenAI technology to foster community engagement.
Anticipation Builds for ChatGPT Desktop App: Members are eager for the release date of the desktop ChatGPT app for Windows and the public rollout of search GPT.

Lingering uncertainty exists regarding the remaining founders at OpenAI since many have left the company.
DALL-E 3 Performance Vs Competitors: Users discussed the performance of the DALL-E 3 model utilized by Bing AI Image Creator, noting distinct differences in generated results compared to other models.

A comparison highlighted DALL-E 3's effectiveness in certain scenarios over models like Llama.
Curiosity around Llama Model API: Questions emerged regarding the Llama model's performance and whether a free API exists, as contributors showed interest in running models locally.

While Llama is open-source, members confirmed the absence of an official free unlimited API, revealing limitations in access.
Generative AI Set to Enhance Gaming: Members discussed the potential for generative AI to enhance gaming experiences in titles like BG3 and Pathfinder, envisioning unique character designs.

Excitement arose over the prospect of immersive interactions with NPCs, revolutionizing player engagement.

Perplexity AI Discord

Perplexity AI Model Comparisons: Users shared experiences comparing GPT-4o and Turbo, noting Turbo consistently outperforms in follow-up interactions, while GPT-4o struggles with new instructions, leading some to revert to Sonnet.

Frustrations arose as it became clear that GPT-4o is misinterpreting newly provided guidance, impairing the user experience.
NVIDIA Blackwell GPUs facing delays: NVIDIA's next-gen Blackwell GPUs have hit roadblocks with design flaws identified late in production requiring redesign, coupled with packaging issues from TSMC complicating timelines.

Developers anxiously await updates as these delays could impact market availability and future projects reliant on these GPUs.
Concerns over Perplexity API output: Users reported bizarre, garbled output from the Perplexity API when prompted for article writing, indicating potential issues with the API's response handling.

Moreover, concerns around a 502 error while querying prompted hints to check the status page for updates.
Inquiry into Llama 3 Performance: Members discussed the anticipated performance of Llama 3 1.405B, sparking interest in how it compares against existing models.

The conversation swirled around benchmarking metrics and whether it can eclipse contenders in the same weight class.
Uploading and Token Limit Issues: A user faced a 'Failed to count tokens' error while uploading larger PDFs, leading to discussions on model token limits and potential workarounds like converting to TXT format.

This sparked a collective discussion on effective handling of file uploads and mitigating API limitations during interactions.

Eleuther Discord

Mechanistic Anomaly Detection Underperformance: Recent analyses indicate that mechanistic methods for detecting anomalies in language models often fail to outperform non-mechanistic baselines focused on activations, though they show promise when evaluating batches of test data.

Despite some strong performance in specific tasks, variability remains a concern, emphasizing the complexity of effective anomaly detection.
Support Grows Against SB1047: A collective of academics has rallied to sign an open letter opposing California's SB1047, fearing that it may impede research on large ML models and AI safety.

Participants in the discussion acknowledged Anthropic's response to the bill as sensible, reflecting the contentious nature of the debate regarding accountability versus innovation in AI.
Meta's Infrastructure for Distributed AI Training: At ACM SIGCOMM 2024, Meta highlighted the critical role of AI networks in facilitating distributed training workloads, particularly illustrated in their work with LLAMA 3.1 405B.

Their research on RDMA over Ethernet demonstrates the growing demands AI models place on existing network infrastructures.
Training Instability Concerns: Members speculate that noise is a primary factor behind training instability, rather than double descent, suggesting improvements in training techniques could help.

It was proposed to conduct multiple experimental runs to ensure data reliability and consider lowering the learning rate for enhanced stability in training.
Expanding Understanding of Sparse Autoencoders: Several foundational works discussing SAEs were referenced, including studies that explore scaling from toy models to larger parameters, encouraging deeper study into SAE methodologies.

A comprehensive SAE landscape overview and the new SAELens library were presented as tools for enhanced analysis, aimed at improving interpretability within language models.

LangChain AI Discord

Ollama Memory Issues Revealed: Users encountered out-of-memory errors on models like aya and nomic-embed-text when using an 8GB GPU despite possessing 32GB of RAM. The fix suggested was to set num_gpu = 0, enabling CPU-only operations.

This workaround was critical for users facing similar hardware limitations.
LangGraph Course Suggestions Flow: Members shared insights on courses for mastering LangGraph, pointing to a notable offering from DeepLearning.ai. A discussion highlighted the appropriateness of beginner-friendly materials over advanced ones for new learners.

Another choice was an advanced course on Udemy, fostering a resource-sharing mindset.
Mood2Music Connects Moods to Tunes: Mood2Music, an app designed to recommend songs based on mood, connects with platforms like Spotify and has launched a waitlist for user enrollment. This AI-driven tool aims to personalize music discovery.

This initiative signals an innovative approach to music interaction, capturing user sentiments effectively.
Agentgenesis Sparks Developer Interest**: The launch of Agentgenesis, a library offering AI component snippets, promises to enhance development efficiency, claiming a potential 10x improvement for Gen AI apps. The project is fully open-sourced under MIT license.

Active collaboration is encouraged within the community to enrich the library's offerings.
SQL Chat Agent Seeks Collaborators: Discussion around the SQL chat agent project drew attention, with a user seeking assistance on their scripting challenges. Members quickly engaged to share insights based on their own experiences.

This interaction exemplifies the community's spirit of collaboration, as direct messaging for script reviews was initiated.

OpenRouter (Alex Atallah) Discord

GPT-4o-2024-08-06 is Now Live!: The new model GPT-4o-2024-08-06 has been officially released and is available for use at OpenRouter. This version promises enhanced performance in structured outputs and introduces the ability to supply a JSON schema in the response format.

However, there are ongoing issues with structured outputs in strict mode that are currently not fully supported, prompting users to report problems in specific threads.
Gemini Pro 1.5 Encountering Resource Exhaustion: Users reported 'Resource has been exhausted' errors with Gemini Pro 1.5, attributed to Google's rate limiting rather than misconfiguration. This has led to frustrations as users navigate around these constraints.

One user confirmed that these problems stem from Google's strict rate limits on this model, making performance a concern for developers relying on continuous access.
Significant Price Drops for Google Gemini: On the 12th, the price for Google Gemini 1.5 flash will halve, making it cheaper than both yi-vision and firellava. This price adjustment sparked excitement among users, who foresee facilitating more extensive user-generated content (UGC) applications.

Many in the community view this as a pivotal moment for accessibility in generative models, especially with vast content captioning now within reach.
OpenRouter API Usability Explained: To use the OpenRouter API, users must secure an API key from their profile to operate in compatible interfaces like Lobe Chat. This makes it easier for users to engage with the models via more user-friendly platforms.

This approach encourages new users to interact seamlessly with various AI models without delving into overly complex setup procedures.
Confusion Over Model Capabilities: There was confusion surrounding the GPT-4o-2024-08-06 model's token output limits since OpenRouter displayed only 4,096 tokens compared to the 16,384 tokens stated in the official documentation. This discrepancy raised concerns among users regarding the model's actual capabilities.

Alex Atallah affirmed that updates are pending to rectify this situation and align OpenRouter's information with the factual documentation from OpenAI.

LlamaIndex Discord

CodiumAI Webinar Explores RAG: Join the upcoming webinar with CodiumAI focusing on RAG-augmented coding assistants, essential for creating context-aware AI-generated code. Attendees must verify token ownership to participate.

The webinar highlights best practices to uphold code quality and integrity within enterprise-level AI applications.
Local Multi-Agent System with RabbitMQ: A blog post outlines building a local multi-agent system using RabbitMQ, streamlining communication between agents with tools like Ollama and Qdrant. This setup is simplified by using llama-agents.

Participants gain a comprehensive setup guide to enhance their agent development workflow.
Get Ready for the RAG-a-thon!: LlamaIndex is gearing up for their second RAG-a-thon at the 500 Global VC offices in Palo Alto from October 11-13, in collaboration with Pinecone and Arize AI. Registrants will engage in a weekend of hackathon activities.

This is a unique opportunity for developers to innovate and test ideas in a collaborative environment.
HuggingFace API for Embeddings Discussion: A user sought info on the HuggingFace Inference API for generating embeddings via a private endpoint, prompting reference to specific examples.

Included was a code snippet illustrating how to configure the TextEmbeddingsInference model.
Concerns on SimpleDirectoryReader PDF Loading: Questions arose about SimpleDirectoryReader's behavior of loading PDFs as individual pages, with members inquiring if they can consolidate them into a single document. Solutions were suggested, focusing on modifying the PDFReader.

This enhancement could streamline handling multi-page documents for users.

Cohere Discord

Hallucination Index Ignites Skepticism: The new Hallucination Index ranks 22 leading LLM models, revealing hallucination challenges as model sizes increase.

Members expressed doubt over its accuracy, raising questions about the definition of open-source.
Licensing Debate Surrounds Command R Plus: Discussion focused on whether Command R Plus qualifies as open source under the Creative Commons Attribution Non Commercial 4.0 license.

Controversy arose as some argued the model's weights are not free for commercial use, classifying it as closed source.
The Open Weights vs Open Source Conundrum: A debate unfolded surrounding the terminology distinction between open weights and fully open-source models.

Some noted that open weights often carry restrictions preventing commercial usage, necessitating clearer definitions.
Mistral Models Hold Open Source Credentials: It was pointed out that Mistral is licensed under Apache 2.0, affirming its open-source status contrary to widespread assumptions.

Participants discussed Mistral's commitment to open weights while questioning the openness of training data used.
Cohere Toolkit Powers AI Fellowship Project: The Cohere Toolkit is being used for an AI fellowship project to create a LLM with RAG utilizing a Confluence knowledge base loaded with various data types.

This includes practical knowledge such as recipes, cooking notes, and legal case notes.

Modular (Mojo 🔥) Discord

InlineList misses key features: Members pointed out that InlineList currently lacks __moveinit__ and __copyinit__, emphasizing ongoing development efforts to enhance its functionality.

Significant updates are being merged, showing progress in addressing these limitations.
List gets a small buffer upgrade: Members celebrated the recent addition of optional small buffer optimization for Lists, as outlined in this pull request.

This enhancement allows for effective stack allocation of slots, further optimizing List operations.
Mojo's custom accelerators face hurdles: Users discussed the compatibility of custom accelerators like PCIe cards with Mojo, noting that integration remains limited until it becomes open source.

Concerns were raised about integrating systolic arrays before the open-source transition, hinting at potential challenges ahead.
CXL Integration sparks FPGA design talk: A lively discussion emerged around the integration of cxl.mem on FPGA devices, especially regarding compatibility with Intel's CXL IP blocks.

Users confirmed that they are utilizing a Xilinx VU13P FPGA, indicating a keen interest in exploring hardware capabilities with CXL.
RISC-V support looks promising for Mojo: Members expressed optimism about introducing RISC-V support to Mojo upon its open-source release, relying on lower-level PyTorch IR transformations in the meantime.

While the community sees potential benefits for future applications, current readiness remains a concern.

LAION Discord

John Schulman's leap to Anthropic: OpenAI co-founder John Schulman announced via a Monday X post his move to Anthropic, an AI startup backed by Amazon. This follows OpenAI's recent disbandment of their superalignment team, which was focused on controllability of advanced AI.

Schulman's departure raises questions about OpenAI's internal stability after such critical team changes.
Open-source AI training faces financial strain: A member pointed out that the exorbitant costs of training modern AI models stifle growth in the open-source community reliant on unlicensed data. They argued that more affordable training could lead to a surge of open models dismissive of ethical data sourcing.

The conversation hinted at a pressing need for financial models to support open-source innovation.
Meta's JASCO MIA amidst legal turmoil: Meta's JASCO appears to be missing, with speculation around the influence of Udio and Suno lawsuits on this situation. Community members expressed concern regarding how such legal challenges could derail substantial AI developments.

This underscores the impact of legal landscapes on the progress of high-stakes AI projects.
Nullbulge doxing sparks safety alarms: Rumors surfaced about Nullbulge being doxxed, creating fears among members about the implications for his safety following revelations of his poor operational security. The community advised caution against Internet searches related to him.

Discussions highlighted the sensitive nature of content and the potential fallout from online leaks.
School BUD-E voice assistant introduced: A shared YouTube video showcased a project called School BUD-E, a web-browser voice assistant. The video, however, lacked a comprehensive description, raising curiosity about its functionalities.

Members expressed interest in understanding how this project could fit into educational tech advancements.

tinygrad (George Hotz) Discord

Tinygrad's Feasibility on Aurora Supercomputer: Discussions centered on whether tinygrad can run on the Aurora supercomputer, which relies on Intel GPUs, pointing to potential challenges such as low performance optimization despite aiming for over 2 ExaFLOPS.

The conversation highlighted the technical hurdles related to the specific hardware limitations associated with Aurora's architecture.
Speculation on XMX Support for Tinygrad: Members discussed ongoing efforts related to XMX support in tinygrad, indicating that OpenCL might be a viable, albeit slow, solution.

Participants noted that the Max Data Center GPUs in use do support tensor core instructions, which adds potential for optimization.
Implementing Distributed Computing with Tinygrad: The need for enhanced distributed computing functionality was emphasized, aimed at fully utilizing the capabilities of tinygrad on Aurora.

The discussion underscored compatibility considerations essential for performance improvements.
Clarification on FP8 NVIDIA Bounty Formats: For the FP8 NVIDIA support bounty, clarity came that both E4M3 and E5M2 formats will be needed to meet the bounty requirements effectively.

This agreement set a clear direction for future work on support implementation.
Resolution of Contiguous Buffer AssertionError: An AssertionError related to buffer contiguity in tinygrad was resolved, with George Hotz suggesting that ensuring the buffer is contiguous fixes assignment issues.

One user confirmed success through practical testing, validating the approach.

DSPy Discord

Wiseflow Revamps Information Mining: Wiseflow is introduced as an agile tool for information mining that extracts concise messages from various online channels, facilitating data organization.

The tool allows for automatic categorization and upload of data, enhancing efficiency in managing information.
HybridAGI Releases New Version: The DSPy community has launched an updated version of HybridAGI, a neuro-symbolic system focused on graph-program synthesis.

This version includes multiple notebooks which optimize usability and data processing, promoting easier integration with DSPy and Knowledge Graphs.
LLMs Tackle Software Engineering Challenges: New research explores the role of large language models (LLMs) in software engineering tasks such as code generation and detecting vulnerabilities, emphasizing the need for unified benchmarking.

The divide between LLMs and LLM-based agents is still murky, with researchers calling for clearer classification standards.
MIPRO Surfaces as a Strong Performer: MIPRO is reported to often outperform BootstrapFewShotWithRandomSearch, though performance remains context-dependent.

This highlights the importance of tailoring approaches based on implementation nuances and dataset specifics.
FastEmbed by Qdrant Gains Attention: A member recommended considering FastEmbed by Qdrant for its capabilities in embedding tasks.

This aligns with ongoing discussions on optimizing embeddings within the DSPy community.

OpenAccess AI Collective (axolotl) Discord

Exploring Synthetic Data Generation Strategies: A member inquired about effective synthetic data generation strategies to enhance 8 billion parameter models in reasoning tasks like text to SQL. Utilizing a Chain of Thought (CoT) in synthetic instructions may improve performance.

Thanks! was expressed indicating readiness to experiment on this topic.
Tweaking QLoRA for Gemma 2 27B: Discussions emerged regarding adjustments to the QLoRA for Gemma 2 27B, particularly around the learning rate for optimal performance with the latest Flash Attention.

Another member indicated willingness to test out the setup, highlighting collaborative engagement in the experimentation.
Training Models on L40S GPUs: Inquiries about the performance of training on L40S GPUs yielded positive feedback, confirming that training results are pretty decent.

This conversation indicates a growing interest in leveraging L40S for model training among members.
RoPE Scaling: A Quick Fix for Context Issues: To adjust the context length of fine-tuned models like llama2-13b-hf, it was noted that RoPE scaling serves as a viable solution.

The importance of careful incremental changes was emphasized to achieve solid performance when making these adjustments.
Tracking Bitsandbytes Multi Backend Refactor: A link to a GitHub pull request regarding the multi backend refactor of bitsandbytes was shared, aiming to clarify changes introduced during the process.

This transparency fosters understanding of the ongoing adjustments and their implications across various implementations.

Torchtune Discord

PPO Training Recipe Now Available!: A new end-to-end PPO training recipe has been added to Torchtune, enabling effective Reinforcement Learning from Human Feedback (RLHF). Check the implementation here.

This addition allows users to leverage the PPO paradigm for enhanced model training.
Qwen2 Models Supported in Recipes: Support for Qwen2 models has been integrated into training recipes, starting with a 7B version available at this link. Upcoming releases will include 1.5B and 0.5B models soon.

This expansion allows developers to experiment with Qwen2 in their projects, enhancing model capabilities.
Proposing a Model Index Page: A member suggested creating a dedicated page for each model's builders, particularly with the impending introduction of multimodal LLMs.

This centralized index would explain repetitive information like downloading and configuring models.
Download Confusions with Llama 3: One user reported issues where results seemed to use a BASE model instead of the INSTRUCT model despite having the correct version downloaded.

Another member suggested ensuring prompts are formatted with the correct Llama 3 instruct template to avoid these issues.
Refactored PreferenceDataset Supporting Chat: A member shared a link to a GitHub pull request which refactors the PreferenceDataset to support chat functionality.

The refactor aligns with RFC #1186, and feedback on this update is being requested.

OpenInterpreter Discord

Open Interpreter Setup Woes: Users faced challenges while setting up Open Interpreter with local LLMs, encountering repeated download loops and an openai.APIConnectionError that prevented interaction.

One participant expressed frustration after failing to type 'Hello.' despite several attempts.
Questioning Open Interpreter's Security: A user raised concerns about Open Interpreter's privacy protocols, specifically how data is managed locally, if any third-party entities are involved, and what encryption measures are in place.

This inquiry aims to clarify the safety of deploying the interpreter in sensitive environments.
Contemplating Python Compatibility: A member asked whether Open Interpreter is compatible with Python 3.12, considering installing Python via the Microsoft App Store.

The inquiry reflects ongoing adjustments in development environments as new versions emerge.
Collaborative Error Resolution Efforts: Users exchanged experiences and discussed potential fixes for setup errors, with offers to troubleshoot together via direct messaging.

This collective effort underscores the community's willingness to assist newcomers in overcoming technical barriers.
Navigating Ollama Model Features: A member recommended using ollama list to check available model names since these vary in VRAM requirements, emphasizing the need for proper setup as outlined in the Ollama documentation.

This guidance serves to optimize resource allocation when working with different models.

Mozilla AI Discord

Llamafile Continues to Impress: The core maintainer of Llamafile is making epic progress, focusing on offline, accessible LLMs in a single file.

This project is noted for its potential impact on ease of access to powerful models.
Community Feedback Opportunity: Members are invited to share how the Mozilla AI community can assist them through a survey, with a chance to win a $25 gift card.

This initiative encourages input on resources available within the community.
Join the sqlite-vec Release Party: An invitation to the sqlite-vec release party has been shared, allowing discussions about features and demos with the core maintainer.

Attendees can engage and explore what sqlite-vec offers to enhance their projects.
Machine Learning Paper Talks Scheduled: Upcoming Machine Learning Paper Talks will discuss Communicative Agents and Extended Mind Transformers.

These talks provide insights into recent advancements in machine learning with expert hosts.
Local AI AMA on Self-Hosting Solutions: An AMA featuring the core maintainer of Local AI will offer insights into self-hosting an open source alternative to OpenAI.

This session promises to clarify many aspects of using and setting up Local AI for various applications.

MLOps @Chipro Discord

LinkedIn Engineering Transforms ML Platform: During a recent live session, LinkedIn Engineering showcased their ML platform transformation with a focus on enhanced workflows and efficiency.

For in-depth insights, check out the event here.
Community Engages in ML Transformation Discussion: The event attracted significant participation, reflecting the community's interest in advancements in ML.

Engagement in discussions and questions highlighted the interactive nature of this session.

The Alignment Lab AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The LLM Finetuning (Hamel + Dan) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The DiscoResearch Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

PART 2: Detailed by-Channel summaries and links

Stability.ai (Stable Diffusion) ▷ #general-chat (459 messages🔥🔥🔥):

Stable Diffusion and LoRA
ControlNet usage
Issues with AMD GPUs
r/stablediffusion drama
Image generation techniques

Utilizing LoRA for Line Art Generation: Users discussed the application of the LINE ART STYLE LoRA to create clean line art images from photos, noting its specific triggers and recommended settings.

To achieve this, they recommended starting with the Pony base model and using ControlNet for more controlled image transformations.
Getting Started with ControlNet: ControlNet was highlighted as a versatile tool for guiding image generation, particularly when converting photos into different styles, such as line art.

The conversation included recommendations for using specific ControlNet models to maintain desired image features while applying artistic styles.
Challenges with AMD GPUs in Machine Learning: Users expressed concern over the discontinuation of ZLUDA, which affects the use of AMD GPUs for machine learning applications.

This sparked discussions about the limitations of AMD in this field and users reflected on their choice of hardware in relation to performance.
Drama Surrounding r/stablediffusion Subreddit: The conversation touched on past events involving the takeover of the r/stablediffusion subreddit, citing conflicts between community moderators and stability.ai staff.

This history was linked to a broader discussion on how community dynamics influence platform usage and moderation.
Integrating and Using Stable Diffusion Models: Users shared insights on installing and utilizing LoRA and Stable Diffusion models, emphasizing the importance of configuration and model compatibility.

One user detailed the process of adding LoRA models to Stable Diffusion's architecture, clarifying how to approach image generation prompts effectively.

Links mentioned:

KREA: no description found
Tweet from Karma (@0xkarmatic): Wow, Greg is also taking a leave of absence.
Dependency: no description found
black-forest-labs (Black Forest Labs): no description found
Flux Examples: Examples of ComfyUI workflows
FLUX: Installation with Workflow is Here: no description found
Tweet from Somnium Space (@SomniumSpace): We are delighted to publish this incredible full Keynote Speech by Robert Scoble (@Scobleizer) which he gave at #SomniumConnect2024✨ What will #AI bring to humanity in the next 10 years? How will thi...
THUDM/CogVideoX-2b · Hugging Face: no description found
Line Art Style [SDXL Pony] - V1 | Stable Diffusion LoRA | Civitai: LINE ART STYLE This is a style LoRA meant to mimic line art, specifically art with little to no shading/shadows in order to get clean black lines o...
ComfyUI: Imposing Consistent Light (IC-Light Workflow Tutorial): The video focuses on implementing IC-Light in Comfy UI, specifically for product photography. IC-Light is based on SD1.5, and we use a reference background a...
GitHub - vosen/ZLUDA: CUDA on ??? GPUs: CUDA on ??? GPUs. Contribute to vosen/ZLUDA development by creating an account on GitHub.
CFG: how it works in non-Flux models vs Flux (code examples): The 'guidance' value for flux is a simple numeric input that gets fed into the model. BFL introduced this at distilation time by generating an...
Good Vibrations (Official Music Video): REMASTERED IN HD!Official Music Video for Good Vibrations performed by Marky Mark and The Funky Bunch.#MarkyMark #GoodVibrations #Remastered
Pony Diffusion V6 XL - V6 (start with this one) | Stable Diffusion Checkpoint | Civitai: Pony Diffusion V6 is a versatile SDXL finetune capable of producing stunning SFW and NSFW visuals of various anthro, feral, or humanoids species an...
What are LoRA models and how to use them in AUTOMATIC1111 - Stable Diffusion Art: LoRA models are small Stable Diffusion models that apply tiny changes to standard checkpoint models. They are usually 10 to 100 times smaller than checkpoint

Unsloth AI (Daniel Han) ▷ #general (105 messages🔥🔥):

Unsloth Fine-tuning Issues
LLaMA Model Training
Multi-GPU Support
Inference Optimization
Resource Guides

Challenges with Unsloth fine-tuning: Users expressed concerns about fine-tuning LLaMA3 models with Unsloth and integration into various trainers, including PPO, where recent updates resulted in functionality issues.

Specific problems included the need for the for_inference() method that broke compatibility with existing implementations.
Training LLaMA3 Models: Discussions highlighted using specific prompt formats with LLaMA3 trained in Alpaca format, emphasizing the necessity of formatting for prompts to achieve expected outputs.

New users confirmed that their prompts should align with the training configuration used previously to ensure successful model outputs.
Multi-GPU Support in Development: Multiple users inquired about multi-GPU capabilities in Unsloth, learning that it is currently in beta and expected to offer enhanced features and efficiency upon release.

Users showed interest in upcoming multigpu support promising extra VRAM reduction and increased speed.
Inference Optimization Challenges: Some members faced issues when running inference on Colab after implementing Unsloth, with reports of scripts failing to execute or produce outputs as expected.

Users were encouraged to check setups and tokens utilized in the process to troubleshoot the lack of activity in their executions.
Learning Resources for LLM Inference: Members shared a guide focused on generative AI, finding it useful for high-level overviews but noted a lack of detailed inference information.

Suggestions included exploring other resources for comprehensive coverage on topics like kv caching and flash attention mechanisms.

Links mentioned:

Google Colab: no description found
Google Colab: no description found
Tweet from OpenAI Developers (@OpenAIDevs): Introducing Structured Outputs in the API—model outputs now adhere to developer-supplied JSON Schemas. https://openai.com/index/introducing-structured-outputs-in-the-api/
kalomaze/Mistral-7b-MoEified-8x · Hugging Face: no description found
Google Colab: no description found
Nextra: the next docs builder: Nextra: the next docs builder
Load 4bit models 4x faster - a unsloth Collection: no description found
4bit Instruct Models - a unsloth Collection: no description found
unsloth (Unsloth AI): no description found

Unsloth AI (Daniel Han) ▷ #off-topic (10 messages🔥):

BigLlama-3.1-1T-Instruct
ChatGPT Pokémon Prompts
Gaming Discussions

BigLlama-3.1-1T-Instruct Launch: The newly released BigLlama-3.1-1T-Instruct is an experimental self-merge of Meta-Llama-3.1-405B-Instruct using mergekit. This model is the successor of Meta-Llama-3-120B-Instruct and aims for enhanced performance through adjusted layer duplication.

However, a member noted that it remains 'useless since it hasn't been trained on its merged weights', indicating further work is needed to optimize the model.
Interest in ChatGPT Pokémon Prompt: A user shared a ChatGPT Pokémon prompt, sparking curiosity and positive responses from others in the chat. Users expressed enthusiasm, with one noting, 'wait this is actually really good' after trying it.

This highlights the community's engagement with interactive gaming prompts.
Casual Gaming Conversation: Questions arose about gaming preferences when a member asked if others play games, specifically referencing Minecraft. The light-hearted back-and-forth included joking references to LLM leaderboards, showing a blend of interests in gaming and AI.

This casual chat reflects the relaxed community environment where members engage in various topics, mixing gameplay with AI developments.

Links mentioned:

mlabonne/BigLlama-3.1-1T-Instruct · Hugging Face: no description found
no title found: no description found

Unsloth AI (Daniel Han) ▷ #help (162 messages🔥🔥):

Llama-3 Model Training
Colab Usage and Limitations
Model Deployment with Ollama
GGUF Model Conversion
Error Handling in Fine-Tuning

Issues with Llama-3 Fine-Tuning: Users faced various issues when attempting to fine-tune the Llama-3 model, including errors related to model loading and size mismatches.

Specific errors mentioned include 'Blockwise quantization only supports 16/32-bit floats' and 'Cache only has 0 layers'.
Colab Pro Requirement for Terminal Access: It was noted that access to terminal features in Google Colab, necessary for running Ollama commands, requires a Colab Pro subscription.

This limitation sparked discussions on alternatives for users looking to teach model training without incurring costs.
Running Models Locally with Ollama: Members discussed how to run the trained models using Ollama on local machines, including necessary commands and setup.

Instructions included serving the model via terminal and utilizing API calls to interact with the model.
GGUF Model Conversion Process: Users inquired about the process of converting models to the GGUF format for compatibility with various platforms, including GPT4All.

It was emphasized that the conversion steps are detailed at the end of relevant Colab notebooks.
Community Help and Resource Sharing: Community members helped each other troubleshoot issues, share resources, and clarify steps in the training and deployment processes.

Several valuable links to Colab notebooks and GitHub resources were shared to assist in navigating model usage.

Links mentioned:

Google Colab: no description found
Google Colab: no description found
Google Colab: no description found
Google Colab: no description found
Load: no description found
Serverless GPU Endpoints for AI Inference: Run machine learning inference at scale with RunPod Serverless GPU endpoints.

Unsloth AI (Daniel Han) ▷ #community-collaboration (1 messages):

LLaMA3 model configuration
Cost-effective cloud computing

Seek Cost-effective LLaMA3 Configuration: A member requested suggestions for the ideal configuration to run the LLaMA3 model on RunPod cost-effectively.

Any insights on reducing costs while maximizing performance are welcomed.
Optimal Settings for RunPod: Another member contributed by mentioning performance metrics that could help in tuning RunPod settings for LLaMA3.

They emphasized the importance of balancing GPU types with memory allocation to ensure efficiency.

Unsloth AI (Daniel Han) ▷ #research (1 messages):
vvelo: https://fxtwitter.com/reach_vb/status/1820493688377643178

HuggingFace ▷ #announcements (1 messages):

Gemma 2 2B Release
Diffusers Integration for FLUX
Magpie Ultra Dataset
Whisper Generations with Medusa Heads
llm-sagemaker Terraform Module

Google introduces Gemma 2 2B: Google has released Gemma 2 2B, a lightweight model expanding the Gemma series with 2.6B parameters, ideal for on-device use.

Additional offerings include ShieldGemma for safety filtering and Gemma Scope for sparse autoencoders.
Exciting Diffusers integration for FLUX: Diffusers integration for FLUX has been announced, enabling efficient text-to-image generation.

The integration supports users to run FLUX with limited resources, highlighting the innovative capabilities of the new model.
Magpie Ultra dataset drops: The first open synthetic dataset, magpie-ultra-v0.1, built with Llama 3.1 405B, has just been released.

Created with distilabel, it's praised for its advanced and compute-intensive pipeline capabilities.
150% faster Whisper generations: Recent updates reveal that Whisper generations are now 150% faster thanks to Medusa heads integration.

This method, built on Transformers, reportedly shows minimal drops in accuracy, sparking excitement in ASR research.
llm-sagemaker Terraform module unveiled: A new Terraform module, llm-sagemaker, simplifies the deployment of open LLMs to AWS SageMaker real-time endpoints.

It supports popular models like Llama 3 and Mistral, complete with customizable configurations and integration tests for robust implementation.

Links mentioned:

Google releases Gemma 2 2B, ShieldGemma and Gemma Scope): no description found
Tweet from Vaibhav (VB) Srivastav (@reach_vb)): Gemma 2 2B running in a browser, powered by WebLLM & WebGPU! 🔥 100% local & on-device In less than 24 hours, we've already got the model to the edge! ⚡ Try it out on an HF space below:
Tweet from Vaibhav (VB) Srivastav (@reach_vb)): Gemma 2 2B running in a free Google Colab! 🤗 Powered by transformers! ⚡
Tweet from Georgi Gerganov (@ggerganov)): Simple instructions to get started with the latest Gemma 2 models + llama.cpp https://huggingface.co/blog/gemma-july-update#use-with-llamacpp
Tweet from Sayak Paul (@RisingSayak)): You should have already gone bonkers by now with @bfl_ml's FLUX release. What a model, eh! I am getting back to Twitter after some sprinting with my mates @DhruvNair, @YiYiMarz, and @multimoda...
Tweet from Gabriel Martín Blázquez (@gabrielmbmb_)): Dropping magpie-ultra-v0.1, the first open synthetic dataset built with Llama 3.1 405B. Created with distilabel, it's our most advanced and compute-intensive pipeline to date. https://huggingfac...
Tweet from Vaibhav (VB) Srivastav (@reach_vb)): 150% faster Whisper generations w/ medusa heads! 🔥 Built on top of Transformers with minimal drop in accuracy. Quite exciting area of research, Medusa heads are proven to be incredibly fast for LLM...
Tweet from merve (@mervenoyann)): Shipped: new task guide on Vision Language Models and freshly updated Depth Estimation task guide on @huggingface transformers docs ⛴️📦 👉🏻 Read about VLMs, how to stream, quantization and more 👉�...
Tweet from Philipp Schmid (@_philschmid)): Excited to announce “llm-sagemaker” a new Terraform module to easily deploy open LLMs from @huggingface to @awscloud SageMaker real-time endpoints! 👀 Infrastructure as Code (IaC) tools are crucial f...
Tweet from merve (@mervenoyann)): SAMv2 is just mindblowingly good 😍 Learn what makes this model so good at video segmentation, keep reading 🦆⇓
Tweet from Databricks Mosaic Research (@DbrxMosaicAI)): For our StreamingDataset users: We're thrilled to announce support for storing MDS datasets in @huggingface. S/O to @orionweller for the contribution! Check out the docs here: https://docs.mosaic...

HuggingFace ▷ #general (239 messages🔥🔥):

Hugging Face Resources
Datasets Issues
Summer Plans
AI and School Experiences
Model Development and Filtering Techniques

Hugging Face Datasets Issues and Workarounds: Users discussed issues faced while using Hugging Face Datasets, specifically about loading datasets from multiple JSON lines files. Suggestions included hard coding features and using a schema to resolve data structure interpretation errors.

The conversation highlighted the need for better error messages and potential new flags for the load_dataset function to enhance user experience.
Student Experiences and Summer Plans: Members shared experiences related to school, including challenges faced and feelings about the start of the school year. There was a lighthearted exchange regarding remote classes in previous years and summer plans.

The sentiment reflected nostalgia for remote learning, with some members expressing disbelief at returning to in-person classes.
AI Model Development and Filtering Techniques: A user shared their experience with filtering datasets for an AI project, detailing the challenges with JSON lines files and considering merging files for better feature inference. Discussions included the benefits of chunking files for easy loading and visual inspection.

Another member expressed interest in the user's thesis and the potential for shared knowledge in similar research topics.
Usage of AI Tools for Image Generation: There was a humorous incident where a user discovered their sister using Meta AI to generate cat images, prompting a conversation about the use of AI tools. Reactions varied, with some members expressing their views on AI-generated content.

The sentiment around the matter contrasted with other opinions in the channel, leading to a light-hearted discussion about the implications of using such technologies.

Links mentioned:

Welcome to the 🤗 Machine Learning for 3D Course - Hugging Face ML for 3D Course: no description found
Audio To Spectrogram - a Hugging Face Space by fffiloni: no description found
Hugging Face - Learn: no description found
Riffusion • Spectrogram To Music - a Hugging Face Space by fffiloni: no description found
THUDM/CogVideoX-2b · Hugging Face: no description found
Repository limitations and recommendations: no description found
Create a dataset loading script): no description found
load_dataset with multiple jsonlines files interprets datastructure too early · Issue #7092 · huggingface/datasets: Describe the bug likely related to #6460 using datasets.load_dataset("json", data_dir= ... ) with multiple .jsonl files will error if one of the files (maybe the first file?) contains a full...
Cherry Blossoms Explode Across the Dying Horizon: Provided to YouTube by DistroKidCherry Blossoms Explode Across the Dying Horizon · SakuraburstDeconstructing Nature℗ 643180 Records DKReleased on: 2016-12-18...
Models - Hugging Face: no description found
Spaces Overview: no description found
Spaces - Hugging Face: no description found
Spaces Launch – Hugging Face: no description found
GitHub - SonyCSLParis/NeuralDrumMachine: Contribute to SonyCSLParis/NeuralDrumMachine development by creating an account on GitHub.
GitHub - buaacyw/MeshAnythingV2: From anything to mesh like human artists. Official impl. of "MeshAnything V2: Artist-Created Mesh Generation With Adjacent Mesh Tokenization": From anything to mesh like human artists. Official impl. of "MeshAnything V2: Artist-Created Mesh Generation With Adjacent Mesh Tokenization" - buaacyw/MeshAnythingV2
Issues · huggingface/transformers: 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX. - Issues · huggingface/transformers

HuggingFace ▷ #today-im-learning (3 messages):

Linear Algebra for 3D Video Analysis
Blog Article Recommendations

Learning Linear Algebra for 3D Video: One user expressed interest in learning about linear algebra specifically for 3D video analysis.

“Today I am learning about linear algebra for 3D video analysis.”
Seeking Recommended Blogs or Articles: The same user requested recommendations for blogs or articles related to linear algebra for their studies.

“Can you suggest some really good blogs or articles for the same?”

HuggingFace ▷ #cool-finds (4 messages):

Image Synthesis with Transformers
Integrating Graphs into LLMs

High Resolution Image Synthesis Using Transformers: Discussion highlighted the latent representation of images and the use of a context-rich vocabulary codebook for conditioned image synthesis.

This is particularly relevant for those working on transformers applied to image synthesis.
New Method for Integrating Graphs into LLMs: A member shared a paper proposing a method similar to one introduced at ICML for integrating graphs into large language models (LLMs).

This represents an innovative approach for leveraging graph structures in language models.
Another Graph Integration Technique for LLMs: A link to another relevant paper was shared, here, which explores the integration of graphs into LLMs.

This reinforces the growing interest in utilizing graph data alongside language processing, enhancing capabilities further.

HuggingFace ▷ #i-made-this (5 messages):

SAC agent training
Embodied agent platform
Talking head synthesis
BiRefNet segmentation
3D voxel environments

SAC Agent Training Achieves Multi-Threading Gains: In Part 2 of the SAC agent training with Unity 6 ML-Agents, the developer added multi-threaded support for CUDA and introduced a boredom cooldown pot to enhance training performance. View their YouTube video here.

This installment features a quick SAC agent trainer in a 3D voxel world, promising more engaging behaviors for the agent.
Development of an Embodied Agent Platform: A team is developing an embodied agent platform where agents can chat with players, understand instructions, and perform tasks in a 3D environment. The project page invites community contributions and collaboration.

An online demo is also available to showcase the platform's capabilities.
Talking Head Synthesis with AniTalker: A project focused on talking head synthesis has been launched, featuring a port of AniTalker. This innovative solution strives to animate vivid and diverse talking faces.

Explore more about the project via their Hugging Face space.
BiRefNet: SOTA for Background Removal: The team shared the open-sourced BiRefNet, a state-of-the-art method for high-resolution dichotomous image segmentation. This model demonstrates superior performance compared to RMBG1.4.

Access additional resources, including the arXiv paper and various demo links.

Links mentioned:

ZhengPeng7/BiRefNet · Hugging Face: no description found
Unity ML-Agents | Live Agent training from Scratch | Part 2: a quick sac agent trainer in a 3d voxel world
Anitalker - a Hugging Face Space by Delik: no description found
GitHub - X-LANCE/AniTalker: [ACM MM 2024] This is the official code for "AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding": [ACM MM 2024] This is the official code for "AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding" - X-LANCE/AniTalker
GitHub - thunlp/LEGENT: Open Platform for Embodied Agents: Open Platform for Embodied Agents. Contribute to thunlp/LEGENT development by creating an account on GitHub.
LEGENT - a Hugging Face Space by LEGENT: no description found

HuggingFace ▷ #reading-group (5 messages):

Structured Outputs in OpenAI API
LLMs and Reasoning Limitations
Scratchpad Theory for LLMs
Attention Mechanisms in LLMs

OpenAI's New Standard for Structured Outputs: OpenAI has recently published a blog post recommending the new approach of structured outputs in the API, with minimal attribution to prior work.

This revelation highlighted ongoing concerns about giving credit in the AI research community.
LLMs May Not 'Reason' as We Think: A member expressed skepticism about the reasoning capabilities of LLMs, suggesting they might merely retrieve information instead of genuinely reasoning.

The analogy of an Uber driver suggests that allowing models to explore past experiences could lead to better outcomes than direct Q&A formats.
Token Scratchpads Enhance LLM Performance: It was proposed that LLMs likely require additional tokens as scratchpads to enhance performance, akin to techniques demonstrated in prior research.

The notion is that reasoning occurs in attention layers while memory is stored in linear layers, making additional draft tokens beneficial.
Challenges with Model Depth in Reasoning: Discussion emphasized the limitations of fixed model depth and token counts, noting that increasing steps through added depth requires extensive retraining.

In contrast, adding more tokens can easily enhance a model's capacity to transform information.
Disheartening Trends in Attention Mechanisms: The conversation shifted towards empirical results showing that altering attention mechanisms, like using linear attention, often worsens reasoning tasks.

The member noted a significant number of papers are attempting to replace linear layers with external databases to address reasoning challenges.

HuggingFace ▷ #computer-vision (4 messages):

Depth Estimation
Code Implementations for Research Papers

Depth Estimation Paper from CVPR 2022: A member shared a link to a paper titled Depth Estimation by Combining Binocular Stereo and Monocular Structured-Light presented at CVPR 2022.

This paper explores innovative methods for improving depth estimation accuracy.
Inquiry about Code Implementation: Following the mention of the depth estimation paper, a member inquired whether there's a code implementation available for the discussed research.

The conversation highlights a common interest in bridging theoretical research with practical coding solutions in AI.

HuggingFace ▷ #NLP (2 messages):

NER Annotated CVs Dataset
Identifying Relevant JSON Files
Keyword and Semantic Search Techniques
Using S-BERT for Embedding
Feedback on Dataset Utilization

NER Annotated CVs Dataset shared: A user shared a dataset consisting of 5029 annotated CVs with IT skills marked using Named Entity Recognition (NER), available on Kaggle. The dataset offers manually annotated skills from text extracted from PDFs and is formatted in JSON for compatibility with NLP tools like Spacy.
Task of Identifying Relevant JSON Files: Another user described a task involving over 20,000 JSON files to identify the most relevant 5 file IDs that can answer a generated question from 3 random JSON files in the dataset. They utilized keyword search and semantic search techniques with Elasticsearch and the S-BERT embedding model.
Seeking Optimal Methods for Answers: The user who is working with the JSON files asked for advice on the best method for obtaining the most relevant answers. They specifically mentioned their use of keyword and semantic search techniques.

Link mentioned: NER Annotated CVs: This dataset includes 5029 annotated curriculum vitae (CV), marked with IT skill

LM Studio ▷ #general (157 messages🔥🔥):

AnythingLLM Setup
Model Performance and Comparison
TTS and STT Integrations
Updates in LM Studio and Popular Models
Phi-3 Model Support Issues

AnythingLLM Setup Issues: A user successfully set up AnythingLLM after initially experiencing file access problems by loading a custom Gemma v2 model.

It was revealed that performance issues could arise from hardware limitations, particularly on larger models.
Flux vs. SDXL Model Performance: Discussion highlighted that Flux, a 12b model, greatly outperforms the smaller 2.6b SDXL model, with many users expressing interest in testing Flux.

Participants noted that the initial team behind Stability AI transitioned to Black Forest Labs, contributing to Flux's advancements.
Integration of TTS and STT in LM Studio: Users discussed the feasibility of using TTS and STT integration within LM Studio, emphasizing that it requires navigating various tutorials and potential cloud privacy concerns.

Some members shared that combining LM Studio with APIs can facilitate local speech-to-text functionalities.
Latest Updates and Model Support: Participants speculated on the reasons behind the Phi-3 models no longer being supported in llama.cpp and noted that Oobabooga webui also failed to load them post-update.

Questions were raised about how such changes could impact ongoing AI projects and the availability of models.
Community Collaboration and Project Help: A user offered help to expedite project developments in the community while expressing a desire for collaborators in AGI projects.

Another member acknowledged the difficulty in finding collaborators, highlighting a broader concern about community engagement in ongoing AI projects.

Links mentioned:

Flash Attention: no description found
UGI Leaderboard - a Hugging Face Space by DontPlanToEnd: no description found
GGUF: no description found
Reddit - Dive into anything: no description found
Shut Up! GIF - Money Dollars Cash - Discover & Share GIFs: Click to view the GIF
legraphista/internlm2_5-20b-chat-IMat-GGUF · Hugging Face: no description found
Reddit - Dive into anything: no description found
ggml : add Flash Attention by ggerganov · Pull Request #5021 · ggerganov/llama.cpp: ref #3365 Setting up what's needed for Flash Attention support in ggml and llama.cpp The proposed operator performs: // new res = ggml_flash_attn(ctx, q, k, v, kq_mask, kq_scale); // fused sc...
Open WebUI: no description found

LM Studio ▷ #hardware-discussion (59 messages🔥🔥):

Performance of 8700G/780m IGP
Upcoming hardware rumors
Comparative GPU performance
P40 pricing trends
VRAM requirements for larger models

8700G/780m IGP shows decent results: Testing on the 8700G/780m IGP with ROCm and Vulkan revealed around 25% CPU acceleration with Ollama and 15% with LM Studio.

However, LM Studio limits GPU RAM usage to 20GB, causing larger model failures during loading.
Excitement builds for upcoming hardware battles: Anticipation grows for the Studio M4 Ultra vs 5090 competition, with discussions on pricing and performance expectations.

Participants speculate that the 5090 could launch around $2800-$3000, raising concerns over price gouging and availability.
P40 pricing continues to rise: Recent discussions showed that P40 cards have nearly doubled in price on the second-hand market, ranging from AUD $300 to $600.

Participants reflected on investing in P40s instead of newer cards, noting they can run competitive LLMs at lower costs.
VRAM considerations when upgrading: When contemplating upgrades to larger models, the discussions highlighted the importance of VRAM for processing capabilities.

Recommendations included upgrading to at least a 3060 12GB or a 3090, with consideration for power supply requirements.
Mixed feelings about the 4090's performance: Some users expressed mixed feelings about the 4090, finding it not significantly faster than their previous 3080.

After one day of use, users reported only marginal improvements in model loading and performance, with thoughts of possibly needing a second 4090.

CUDA MODE ▷ #general (5 messages):

PufferLib Gameboy Emulator
Reinforcement Learning Stream
GPUDrive Multi-Agent Simulator
Mojo Talk Proposal

Setting Up PufferLib Gameboy Emulator: An example of setting up environments for a Gameboy emulator in PufferLib was shared.

This aims to simplify reinforcement learning for complex game environments, illustrated by a GitHub link.
Creator's Live Stream Available: The creator of PufferLib streams live development sessions, as mentioned in a YouTube video.

This session allows viewers to interact and ask questions directly during the stream.
Introduction to GPUDrive Simulator: A research paper discussed GPUDrive, a GPU-accelerated multi-agent simulator generating over a million experience steps per second for training RL agents.

It enables effective multi-agent planning and training using high-performance CUDA, significantly speeding up the process.
Proposing a Talk on Mojo: A member honored Chris for joining and inquired about a potential talk from his team regarding Mojo.

The discussion suggested it could range from an introductory overview to cover the project's current state and future vision.

Links mentioned:

Paper page - GPUDrive: Data-driven, multi-agent driving simulation at 1 million FPS: no description found
Reinforcement learning live dev: Follow jsuarez5341 on XStar https://github.com/pufferai/pufferlibMIT PhD and full-time OSS RL exorcist
PufferLib/pufferlib/environments/pokemon_red/environment.py at 729003f9cb89845cc1a69a65e5a2431b2d0542bd · PufferAI/PufferLib: Simplifying reinforcement learning for complex game environments - PufferAI/PufferLib

CUDA MODE ▷ #torch (17 messages🔥):

PyTorch 2.4 with CUDA 12.4 issues
Windows compatibility for torch cublas hgemm
FP16 accumulate performance
Performance benchmarking results

PyTorch 2.4 struggles with CUDA 12.4: One user reported that using PyTorch 2.4 with CUDA 12.4 leads to poor results, while it works fine with CUDA 12.1.

They mentioned running CUDA 12.6 on their base system and using Conda for installation.
Windows Compatibility Gains for Cublas Library: A member has successfully made their torch cublas hgemm library compatible with Windows, previously available only on Linux.

This update enhanced their TPFLOPS performance to ~105 TPFLOPS for fp16 with fp16 accumulation, significantly improving inference speed.
FP16 Accumulate Offers Speed Advantages: Discussion revealed that FP16 accumulate is not supported natively in PyTorch, but it provides faster performance on consumer GPUs due to smaller L1 caches.

However, concerns about accuracy and the potential for inf/nan errors were acknowledged, with a balance of speed versus stability being noted.
Speed/Accuracy Trade-offs Benchmarking: Benchmark results shared indicate that the CublasLinear implementation achieved 313.22 TFLOPS in 438.80 us, compared to 166.47 TFLOPS in 825.59 us for nn.Linear.

Despite minor differences in output, it was noted that these variations do not significantly impact model performance in scenarios like diffusion models or large language models.

Link mentioned: GitHub - aredden/torch-cublas-hgemm: PyTorch half precision gemm lib w/ fused optional bias + optional relu/gelu: PyTorch half precision gemm lib w/ fused optional bias + optional relu/gelu - aredden/torch-cublas-hgemm

CUDA MODE ▷ #algorithms (3 messages):

Model tuning
Quantization bits

Experimenting with Model Tuning: A member reported experimenting with their model but noted they needed to do some tuning to improve accuracy, which currently gets stuck at 70% for CIFAR-10.

They expressed that the model shows promise, but further adjustments are necessary to achieve better results.
Optimizing Quantization Bits: Another member highlighted that viewing the quantization bits as an optimizable parameter is a crucial contribution to model performance.

This perspective could lead to significant improvements in overall tuning and accuracy.

CUDA MODE ▷ #jobs (7 messages):

Hudson River Trading internships
GPU job roles
Application process for internships

Hudson River Trading offers internships: Internships, mainly in the summer, are available at Hudson River Trading, with interns working on research related to GPUs.

While the summer application process isn't open yet, they're expected soon, and you can keep an eye out for future announcements.
Excitement about GPU research roles: A user expressed interest in the GPU job descriptions provided, noting their alignment with their current work.

The conversation revealed a mutual excitement about opportunities in high-performance compute workloads.
Issues with Direct Messaging: There was a concern raised about Direct Messages being turned off for one member, prompting an attempt to send a friend request instead.

The member acknowledged their lack of Discord expertise, suggesting that the settings should be configured correctly.

Links mentioned:

Senior Software Engineer - Performance Optimization (C++/GPU): New York, NY, United States
Hudson River Trading Software Engineer Salary | $406K-$485K+ | Levels.fyi: Software Engineer compensation in United States at Hudson River Trading ranges from $406K per year for L1 to $485K per year for L3. The median compensation in United States package totals $410K. View ...

CUDA MODE ▷ #torchao (34 messages🔥):

INT8 symmetric quantization
Quantized training
Installation errors with torchao
Hardware compatibility issues
GPTQ refactor progress

Discussion on INT8 Symmetric Quantization: A member questioned the reasoning behind using 127.5 for scale in INT8 symmetric quantization in PyTorch, suggesting it's a matter of 'full range quantization'.

They noted that using 127.5 caused model divergence during fine-tuning, proposing to use 127 for positive and 128 for negative in a potential solution.
Quantized Training Insights: A user expressed that INT8 quantized training could have advantages over pre-trained INT8 post-training quantization (PTQ) and proposed investigating INT4 quantized training as an alternative.

It was mentioned that stochastic rounding could impact training and that a comparison should be made with the current INT4 QAT recipe.
Installation Errors with Torchao: A user encountered multiple installation errors with torchao, resolving them by using USE_CPP=0 pip install ., hinting at potential issues with older CUDA versions on their machine.

They were informed that this workaround would skip certain tests like quant_llm_linear, which requires a custom CPP extension.
Hardware Compatibility Issues: Concerns were raised regarding T4 GPU compatibility, noting that errors stemmed from unsupported BF16 operations in some files, suggesting a switch to L4 GPU for similar pricing and better compute capabilities.

A proposed solution involved implementing compile guards to prevent unsupported hardware files from being compiled, while also considering runtime checks for clarity on function failures.
Progress on GPTQ Refactor: A user updated on their progress on the GPTQ refactor, indicating they were about 45% done with a basic runner utilizing MultiTensor.

They aim to fulfill the respective GitHub issue over the next few days, indicating collaboration and issue resolution within the community.

Links mentioned:

PyTorch
: no description found

- ao/torchao/quantization/quant_primitives.py at de4a1fb3b1f71e2f61b84dfdc96e7d704ff72208 · pytorch/ao: The missing pytorch dtype and layout library for training and inference - pytorch/ao

- Quantization - Neural Network Distiller: no description found

- pytorch/aten/src/ATen/native/cuda/int4mm.cu at e98eac76b358fb4639b9e9ce6894014354d7b073 · pytorch/pytorch: Tensors and Dynamic neural networks in Python with strong GPU acceleration - pytorch/pytorch

CUDA MODE ▷ #off-topic (7 messages):

Llama 3 Dataset Analysis
Prefix Chunk LLM Paper
SARATHI Inference Techniques
Spector CTF Challenge

Llama 3 Paper's Dataset Section Stands Out: A member noted that the Llama 3 paper reads quickly, with the most interesting part being the dataset section; other sections were better explained in other papers.

Ignoring the rest, they emphasized the importance of the dataset in understanding the model.
Sarathi LLM Prefix Chunk Paper Suggested: Another member recommended reading the Sarathi LLM paper, stating it is more fun to explore.

A link to the paper was shared, indicating its relevance to the ongoing discussions about LLMs.
Insights on ChunkAttention in Recent Papers: A member discussed the ChunkAttention module introduced in the prefix chunk LLM paper, designed to enhance memory utilization of KV cache during LLM requests.

They shared a link to the paper, explaining its significance in reducing inference latency for long sequences.
Exploring SARATHI for Efficient LLM Inference: The SARATHI paper was highlighted as addressing inefficiencies in the inference phases of Large Language Models by implementing chunked-prefills and decode-maximal batching.

This approach promises to improve GPU utilization significantly during model inference.
CTF Challenge Featuring Kernel Exploitation: A challenge titled 'Spector' was introduced, focusing on kernel internals and micro-architectural attacks within a CTF format linked to the corCTF 2023 theme.

Participants were provided with a new syscall for Linux 6.9.0, featuring example code demonstrating the challenge mechanics.

Links mentioned:

ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition: Self-attention is an essential component of large language models (LLM) but a significant source of inference latency for long sequences. In multi-tenant LLM serving scenarios, the compute and memory ...
SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills: Large Language Model (LLM) inference consists of two distinct phases - prefill phase which processes the input prompt and decode phase which generates output tokens autoregressively. While the prefill...
Will's Root: corCTF 2024: Its Just a Dos Bug Bro - Leaking Flags from Filesystem with Spectre v1: no description found

CUDA MODE ▷ #llmdotc (99 messages🔥🔥):

Ragged Attention Challenges
Tokenization Issues in Llama Models
Training Stability with Batch Size
Instruct Model Newline Formatting
Benchmarking PyTorch 2.4 with Nvidia

Ragged Attention Challenges in Training: Some members expressed concerns about the need for ragged attention masks to prevent out-of-distribution sampling during training, suggesting that current implementations may overlook this requirement.

There was consensus that handling different masks correctly is crucial for effective training on complex sequences, highlighting the role of the attention mask shape.
Tokenization Issues in Llama Models: Discussion highlighted a potential bug in the Llama models where the stop_tokens may not include <|end_of_text|>, leading to unexpected continuous sampling.

Concerns were raised about inconsistencies in the model's behavior and documentation regarding special tokens, prompting further investigation into their training process.
Training Stability with Batch Size Adjustments: Members noted that using a lower batch size early in training could improve stability and generalization accuracy, despite being counterintuitive for efficiency.

The importance of batch size schedulers for handling gradient variance during training was emphasized, with references to papers discussing training stability.
Instruct Model Newline Formatting: The conversation addressed the need for proper newline formatting in instruct models, with speculations about its role in training and user message interactions.

Some members questioned why newlines were mentioned in the context of base models, suggesting it could confuse expectations for how models handle input.
Benchmarking PyTorch 2.4 with Nvidia: A user reported that running train_gpt2.py with PyTorch 2.4 on Nvidia showed improved performance, being faster than previous implementations around llm.c.

Although not a perfectly fair comparison, they noted that enabling flash attention provided a slight edge in speed, highlighting ongoing improvements in model training efficiency.

Links mentioned:

The Stability-Efficiency Dilemma: Investigating Sequence Length Warmup for Training GPT Models: Recent works have demonstrated great success in pre-training large-scale autoregressive language models on massive GPUs. To reduce the wall-clock training time, a common practice is to increase the ba...
Llama 3 | Model Cards and Prompt formats: Special Tokens used with Llama 3. A prompt should contain a single system message, can contain multiple alternating user and assistant messages, and always ends with the last user message followed by ...
Spike No More: Stabilizing the Pre-training of Large Language Models: Loss spikes often occur during pre-training of large language models. The spikes degrade the performance of large language models and sometimes ruin the pre-training. Since the pre-training needs a va...
Templates for Chat Models: no description found
🤗 Transformers: no description found
Issues · Dao-AILab/flash-attention: Fast and memory-efficient exact attention. Contribute to Dao-AILab/flash-attention development by creating an account on GitHub.
Issues · pytorch/torchchat: Run PyTorch LLMs locally on servers, desktop and mobile - Issues · pytorch/torchchat

CUDA MODE ▷ #rocm (9 messages🔥):

ZLUDA 3 Removal
AMD's Permission Dispute
Employment Contract Clauses

ZLUDA 3 gets taken down: The author of ZLUDA has withdrawn version 3 after AMD claimed the permission given to him for its release was invalid, leading to heated discussions on GitHub.

A community member humorously remarked, 'email not legally binding' in response to this controversy.
Permission dispute over AMD's stance: The discussions revealed that AMD may believe the employment contract under which the author released ZLUDA is not legally binding, raising concerns over its implications.

One participant noted that if AMD finds ZLUDA 'fit for further development', the author may not have the authority to release it freely.
Clause in the employment contract: The author mentioned that a clause in his contract allowed for release if AMD deemed ZLUDA unfit for further development, sparking interest in the specifics of contract terms.

Members expressed surprise and frustration, with one stating, 'thanks AMD' sarcastically.

Links mentioned:

GitHub - vosen/ZLUDA: CUDA on ??? GPUs: CUDA on ??? GPUs. Contribute to vosen/ZLUDA development by creating an account on GitHub.
GitHub - vosen/ZLUDA at v3: CUDA on ??? GPUs. Contribute to vosen/ZLUDA development by creating an account on GitHub.

CUDA MODE ▷ #cudamode-irl (2 messages):

Project Timelines
Google Form Proposals

Expect to Know Project Updates by Month's End: A member expressed that they believe updates on the project will be available by the end of the month at the latest.

This indicates an urgency to finalize any outstanding matters before the deadline.
Details Needed for Google Form Proposals: Another member suggested that to clarify specifics, it's best to add more details about work plans in the Google form or share a gist proposal linked here.

This approach aims to streamline communication and ensure everyone is aligned on the tasks ahead.

Nous Research AI ▷ #datasets (1 messages):

UltraSteer-V0 Dataset
Nvidia Llama2-13B-SteerLM-RM
Fine-Grained Dialogue Labels
De-duplication Process
Initial Version Release

Introducing UltraSteer-V0 Dataset: UltraSteer-V0 is a massive collection of 2.3M conversations with 2.8M turns and 9 fine-grained signals produced by Nvidia's Llama2-13B-SteerLM-RM reward model.

This initial version, despite being a 'version zero', has undergone significant processing over 22 days.
Fine-Grained Labels Explained: The UltraSteer dataset labels each assistant turn with attributes such as Quality, Toxicity, Humor, and Creativity on a scale from 0 to 4.

These detailed ratings aim to capture the essence of the assistant's responses in multi-turn dialogues.
De-duplication Enhancements: The dataset has implemented a de-duplication process to primarily ensure unique assistant messages across dialogues with identical initial turns.

However, it is acknowledged that further de-duplication and improvements to the dataset card are still needed.
Dataset Accessibility: The UltraSteer dataset is available for access on Hugging Face.

This resource offers researchers and developers a valuable tool for dialogue systems and AI training.

Link mentioned: Avelina/UltraSteer-v0 · Datasets at Hugging Face: no description found

Nous Research AI ▷ #off-topic (1 messages):
vikings7699: Has anyone here ever worked on fine tuning a model specifically for insurance sector?

Nous Research AI ▷ #general (129 messages🔥🔥):

Model Training Issues
Open Medical Reasoning Tasks
New Model Releases
Flux AI Capabilities
Hugging Face Developments

Model Training Confusion: Members discussed challenges with training models, highlighting issues like catastrophic forgetting and overfitting when using different datasets and learning rates.

One member noted that using a very small learning rate across separate datasets can sometimes lead to disaster, leaving them frustrated with the performance.
Introduction to Open Medical Reasoning Tasks: A new initiative titled the Open Medical Reasoning Tasks project was announced, aiming to gather medical reasoning tasks for LLMs and seeking contributions from medical professionals.

The project emphasizes collaboration and is hosted on GitHub, reflecting a push towards improving AI applications in healthcare.
MiniCPM and New Model Releases: Discussion included updates on new models, such as MiniCPM-Llama3-V-2.5, touted as a GPT-4V level multimodal LLM available on Hugging Face.

Members expressed interest in model capabilities, particularly in handling multiple images and tool support for RAG implementations.
Capabilities of Flux AI: Flux AI received praise for its new skills in text comprehension and image generation, heightening interest among users looking to leverage its features.

A presentation slide summarized its proficiencies in engaging with textual inputs and visual creativity, sparking excitement among the community.
Hugging Face Development Talks: Multiple users discussed the capabilities of different models available on Hugging Face, including intricate features like multi-image handling.

Members shared insights on ongoing developments, with emphasis on improvements seen in newer releases of models and their potential applications.

Links mentioned:

Tweet from Aaditya Ura ( looking for PhD ) (@aadityaura): Exciting news! 🎉 Introducing the Open Medical Reasoning Tasks project! Inspired by @NousResearch and @Teknium1, @OpenLifeSciAI ( Open Life-Science AI ) is launching an open, collaborative initiative...
Tweet from fofr (@fofrAI): 🤯 > powerpoint presentation, the slide title says “Flux AI has new skills”, three bullet points, “good at text”, “prompt comprehension”, “amazing images”
HuggingFaceM4/Idefics3-8B-Llama3 · Hugging Face: no description found
openbmb/MiniCPM-Llama3-V-2_5 · Hugging Face: no description found
openbmb/MiniCPM-V-2_6 · Hugging Face: no description found
Reddit - Dive into anything: no description found
MiniCPM-V Finetuning for multi-image input during a multi-turn conversation💡 [REQUEST] -  · Issue #233 · OpenBMB/MiniCPM-V: 起始日期 | Start Date No response 实现PR | Implementation PR No response 相关Issues | Reference Issues for multi-image input during a multi-turn conversation 摘要 | Summary for multi-image input during a mul...
Tweet from Maxime Labonne (@maximelabonne): 🦙✨ BigLlama-3.1-1T-Instruct So I've heard that 405B parameters weren't enough... It's my pleasure to present an upscaled Llama 3.1 with 1,000,000,000 parameters. Now available on @hugg...
Issues · black-forest-labs/flux): Official inference repo for FLUX.1 models. Contribute to black-forest-labs/flux development by creating an account on GitHub.
Generated with Flux.1 Pro and Schnell : Posted in r/StableDiffusion by u/Sea_Law_7725 • 367 points and 77 comments
Generated with Flux.1 Pro and Schnell : Posted in r/StableDiffusion by u/Sea_Law_7725 • 367 points and 77 comments

Nous Research AI ▷ #ask-about-llms (19 messages🔥):

Library Usage for Fine-tuning
Inference Stack Resources
Insurance Sector Fine-tuning
Pay-as-you-go Llama 450b Hosting
Memory Bottlenecks in Inference

Library Usage for Fine-tuning Practices: A member inquired whether most people are utilizing libraries for fine-tuning and training or if they are writing unique training scripts.

Another member mentioned Axolotl as a potential library for this purpose.
Getting Started with Inference Stack: A member sought recommendations for resources or codebases for the vLLM inference stack, acknowledging the existence of the vLLM project.

This inquiry opens the door for community suggestions on useful starting points.
Fine-tuning Models for Insurance Sector: One member queried if anyone had experience fine-tuning a model specifically for the insurance sector.

This highlights an interest in niche applications of model fine-tuning.
Pay-as-you-go Access for Llama 450b Hosting: A member asked about companies hosting Llama 450b that offer pay-as-you-go access, noting Groq's requirement for an enterprise account.

Another member recommended Openrouter as a possible option, while discussing the presence of multiple providers.
Memory Bottlenecks and Compute Bound Issues: In a discussion about inference and training, a member raised a question about whether memory is the main bottleneck or if there are other factors.

Another member clarified that while memory is critical for batch size 1, larger batch sizes become increasingly compute-bound, referring specifically to GPU utilization.

Nous Research AI ▷ #reasoning-tasks-master-list (7 messages):

Open Medical Reasoning Tasks
Synthetic Task Generation
LLMs potential limitations

Open Medical Reasoning Tasks Launch: Inspired by previous projects, the Open Medical Reasoning Tasks initiative aims to create a comprehensive list of medical reasoning tasks for LLMs. Contributions from physicians, researchers, and data scientists are encouraged via GitHub.

This is AMAZING! A member expressed enthusiasm for the collaborative nature of the project, emphasizing the benefits of open work.
Discussion on Improving Synthetic Tasks: There was a contemplation on how to enhance synthetic task generation beyond the current capabilities of LLMs. One member noted uncertainty on how to progress with these improvements.

The focus is on pushing the boundaries of what LLMs can achieve in task generation for various applications.

Links mentioned:

Tweet from Aaditya Ura ( looking for PhD ) (@aadityaura): Exciting news! 🎉 Introducing the Open Medical Reasoning Tasks project! Inspired by @NousResearch and @Teknium1, @OpenLifeSciAI ( Open Life-Science AI ) is launching an open, collaborative initiative...
GitHub - open-thought/system-2-research: System 2 Reasoning Link Collection: System 2 Reasoning Link Collection. Contribute to open-thought/system-2-research development by creating an account on GitHub.

Latent Space ▷ #ai-general-chat (128 messages🔥🔥):

Web Developer to AI Engineer Pipeline
OpenAI Departures
Generative AI in Retail
Structured Outputs in GPT-4o
Energy-Based Language Modeling

Web Developer to AI Engineer Pipeline Discussion: Members discussed the evolving transition for web developers into AI engineering roles, highlighting the insane demand for AI expertise and the lack of ML engineers.

A member emphasized that web developers possess valuable skills in API integrations, positioning them well for AI engineering opportunities.
Key Figures Departing OpenAI: Concerns emerged regarding several key departures from OpenAI, prompting speculation about the company's future direction and morale.

Community sentiment shifted toward skepticism regarding OpenAI's stability, with playful remarks about leadership dynamics intertwining with personal insights.
Generative AI Applications in Retail: A member shared insights on how companies like L'Oreal leverage generative AI for various applications including product descriptions and targeted marketing strategies.

The effectiveness of these approaches raises questions about how to measure the success of AI-generated content in retail contexts.
Introduction of Structured Outputs in GPT-4o: OpenAI announced a new feature enabling structured outputs in their GPT-4o API, allowing more reliable adherence to developer-supplied JSON schemas.

This update aims to enhance schema reliability from 86% to 100%, showcasing improvements in handling complex structured data.
Skepticism Surrounding Energy-Based Language Modeling: A member recounted a humorous encounter with an Extropic AI engineer, who seemed unfamiliar with prominent research in energy-based language modeling.

This anecdote leads to broader discussions about the legitimacy and understanding of various AI concepts within certain organizations.

Links mentioned:

Tweet from Michelle Pokrass (@michpokrass): excited to announce Structured Outputs -- our newest feature in the api. model outputs will now reliably follow your exact json schemas, matching the parameters and types accurately. schema reliabil...
no title found: no description found
Tweet from Philipp Schmid (@_philschmid): "Deep Reinforcement Learning from Human Preferences" and "Proximal Policy Optimization Algorithms" are part of the foundation of modern RLHF in LLMs.
no title found: no description found
Tweet from anton (@abacaj): interesting... new model also includes a pretty big price drop Quoting OpenAI Developers (@OpenAIDevs) Introducing Structured Outputs in the API—model outputs now adhere to developer-supplied JSON ...
Tweet from jack morris (@jxmnop): funny little story about Extropic AI >been curious about them for a while >have twitter mutual who is an engineer/researcher for this company >often tweets energy-based modeling and LM-quant...
Tweet from Aizk ✡️ (@Aizkmusic): @BigTechAlert @ChatGPTapp @TarunGogineni His LinkedIn bio is great
Efficient Guided Generation for Large Language Models: In this article we show how the problem of neural text generation can be constructively reformulated in terms of transitions between the states of a finite-state machine. This framework leads to an ef...
Tweet from Two Weeks LOL (@TwoWeeksLOL): @MKBHD Uh oh...
Tweet from OpenAI Developers (@OpenAIDevs): We’re taking OpenAI DevDay on the road! Join us this fall in San Francisco, London, or Singapore for hands-on sessions, demos, and best practices. Meet our engineers and see how developers around the ...
Tweet from John Schulman (@johnschulman2): I shared the following note with my OpenAI colleagues today: I've made the difficult decision to leave OpenAI. This choice stems from my desire to deepen my focus on AI alignment, and to start a ...
Tweet from Jason Koebler (@jason_koebler): SCOOP from @samleecole: Leaked Slacks and documents show the incredible scale of NVidia's AI scraping: 80 years — "a human lifetime" of videos every day. Had approval from highest levels o...
Tweet from Mira (@Mira___Mira): no description found
Tweet from roon (@tszzl): all the people that can make eye contact at openai joined in the last 6 months and they’re making me uncomfortable with their eye contact
Tweet from Nick Dobos (@NickADobos): Great post on writing code with ai Love this chart Quoting Erik Schluntz (@ErikSchluntz) Replacing my right hand with AI (How I wrote thousands of lines of code for work each week while in a cast)...
GitHub - simonw/datasette: An open source multi-tool for exploring and publishing data: An open source multi-tool for exploring and publishing data - simonw/datasette
eCommerce & Retail: Discover how innovative eCommerce and retail companies use Writer to create on-brand content that works, from first touch to sale.

OpenAI ▷ #annnouncements (1 messages):

OpenAI DevDay
Hands-on sessions
Developer meetups

OpenAI DevDay taking a global trip: OpenAI is taking DevDay on the road this fall, with events scheduled in San Francisco, London, and Singapore.

Participants will engage in hands-on sessions, demos, and learn best practices from engineers, showcasing how developers worldwide are leveraging OpenAI's technology.
Join hands-on sessions at DevDay: The DevDay events will feature immersive hands-on sessions where developers can learn directly from OpenAI engineers.

This initiative aims to foster community engagement by highlighting how developers are building with OpenAI tools.

OpenAI ▷ #ai-discussions (86 messages🔥🔥):

ChatGPT App Release
DALL-E 3 Usage
Llama Model API
OpenAI Structured Outputs
Video Analysis Capabilities

Anticipation for ChatGPT Desktop App: Members expressed interest in the release date of the desktop ChatGPT app for Windows and the public release of the search GPT.

Conversations hinted at lingering uncertainty about the remaining founders since many have left the company.
Exploration of DALL-E 3 Performance: Users discussed the DALL-E 3 model used by Bing AI Image Creator, noting differences in generated results.

A comparison was made regarding the efficacy of DALL-E 3 versus other models like Llama.
Llama Model and API Questions: Questions arose about the Llama model's performance and whether there is a free API available, with many contributors expressing interest in running models locally.

Members confirmed that while Llama is open-source, there is no official free unlimited API, indicating some limitations in access.
Improvements with OpenAI's Structured Outputs: Discussion revealed excitement around OpenAI's new Structured Outputs feature, which ensures adherence to JSON schemas.

Members noted improvements in response formatting and pricing changes with enhanced model capabilities.
Video Analysis Functionality Concerns: Users inquired about the ability of ChatGPT to analyze videos, revealing technical issues with certain formats.

Suggestions for alternative usage approaches were shared, including recommendations to use mobile browsers instead of the app.

Link mentioned: Assistant GPT - Can I perform knowledge retrieval from a cloud storage?: I have some files that are on my cloud storage (onedrive) and would like to perform knowledge retrieval on them. Is it possible to integrate an assistant to perform knowledge retrieval directly fro...

OpenAI ▷ #gpt-4-discussions (16 messages🔥):

Search GPT availability
Upload limits for members
Generative AI in gaming
GPT-4o updates
Model response changes

Search GPT is available now: Members confirmed that Search GPT is currently available for use.

Upload limits affect all users: Despite being a member, one user found they could not upload photos due to limits, which also affect paid users.

It says my limit resets at 1:35.
Generative AI could revolutionize gaming: A member expressed excitement about the potential for generative AI to enhance player experiences in games like BG3 or Pathfinder.

They envision a game allowing unique character designs and immersive interactions with NPCs.
GPT-4o receives a significant update: A user noted updates to GPT-4o, specifically mentioning a model change to gpt-4o-2024-08-06 with potential cost reductions.

Another member confirmed that ChatGPT should be using this updated model as well.
Changes in GPT-4o's responses: One user questioned if there had been a recent change in ChatGPT-4o as their responses felt notably different than the previous months.

A member responded by sharing a link to structured outputs introduced in the new model.

OpenAI ▷ #prompt-engineering (1 messages):
darthgustav.: Use the python tool and import data from uploads.

OpenAI ▷ #api-discussions (1 messages):
darthgustav.: Use the python tool and import data from uploads.

Perplexity AI ▷ #general (82 messages🔥🔥):

Perplexity AI updates
Feedback on Model Performance
Technical Issues with Uploading
Content Sorting and Recommendation Systems
User Experience with Pro Features

Perplexity AI Model Comparisons: Users discussed their experiences with GPT-4o and Turbo, noting that Turbo consistently performed better in interactions, especially in follow-up questions.

Some users expressed frustration with GPT-4o's lack of acknowledgment of new instructions, leading them to switch back to Sonnet, which provided a more responsive experience.
Uploading and Token Limit Issues: A user reported encountering a 'Failed to count tokens for this attachment' error when uploading larger PDFs, with discussions about the token limits of various models.

Suggestions to convert PDFs to TXT format offered potential workarounds for these upload challenges.
Inquiry about Content Sorting Tools: A user working on a content sorting and recommendation engine sought examples of similar tools to help with their university project.

Another user advised looking into RAG (Retrieval-Augmented Generation) for insights on existing platforms and functionality.
Perplexity Pro Features and Functionality: Concerns were raised about recent limitations in switching LLMs and accessing collections in the Perplexity Pro app, which seemed to resolve itself shortly after.

Users questioned the consistency of the Pro features, particularly if the redeemable 1-month free Pro subscription altered capabilities.
Sounding Off with Language Quirks: Humor emerged from a user sharing a reflective thought on the complexities of the English language, particularly phrases involving the bathroom.

This prompted a discussion about how such nuances may confuse non-native speakers and even AI systems.

Links mentioned:

no title found: no description found
Releases · inulute/perplexity-ai-app: The Perplexity AI Desktop App, powered by Electron which brings the magic of AI language processing to your desktop. - inulute/perplexity-ai-app
When Tom's funeral was held, his father didn't attend. Now that his father has passed away, Tom didn't show up at his father's funeral either. Is Tom going too far?: The situation you described involves a complex interplay of personal relationships and individual choices. Here are some points to consider: ### Context and Ba

Perplexity AI ▷ #sharing (7 messages):

NVIDIA Blackwell GPU Delays
Google Legal Setback
Market Jitters
Warhol's Digital Portrait
Llama 3 Performance

NVIDIA Blackwell GPUs face delays: NVIDIA's next-generation Blackwell GPUs have encountered delays due to design flaws discovered late in the production process, requiring a redesign of the processor die.

Additionally, packaging issues with the sophisticated CoWoS-L technology from TSMC further complicated production timelines.
Warhol's digital art sold for $26M: A digital portrait by Andy Warhol has been sold for a staggering $26 million, highlighting the growing market for digital art.

The sale signals a significant moment in the convergence of art and technology as digital ownership gains traction.
Discussion on Llama 3 Performance: Members expressed interest in the performance metrics of the upcoming Llama 3 1.405B model, with performance comparisons noted.

Further input required how it stacks up against other leading models, sparking a detailed debate among AI enthusiasts.

Links mentioned:

YouTube: no description found
NVIDIA Blackwell's Delay Explained: NVIDIA's next-generation Blackwell GPUs have encountered delays primarily due to design and manufacturing issues. Here are the main reasons for the delay: The...
Perplexity: Perplexity is a free AI-powered answer engine that provides accurate, trusted, and real-time answers to any question.
Perplexity: Perplexity is a free AI-powered answer engine that provides accurate, trusted, and real-time answers to any question.
Perplexity: Perplexity is a free AI-powered answer engine that provides accurate, trusted, and real-time answers to any question.
Perplexity: Perplexity is a free AI-powered answer engine that provides accurate, trusted, and real-time answers to any question.
apa saja benda yang mengandung karbon: Benda yang mengandung karbon sangat beragam dan dapat ditemukan dalam berbagai bentuk di kehidupan sehari-hari. Berikut adalah beberapa contoh benda yang...

Perplexity AI ▷ #pplx-api (8 messages🔥):

Perplexity API Issues
Model Availability
API Errors
Testing on Labs
Status Updates

Perplexity API returning bizarre output: A user reported unusual, garbled output when using the Perplexity API with an article-writing prompt, detailing issues that occur after the initial lines.

Another user also echoed similar concerns, suggesting potential issues with the API's response.
Concerns about API model deprecation: A member inquired if all Perplexity API models would be discontinued on August 12, to which another user provided a link to a detailed guide about available models and their specifications.

The guide confirmed that the models will indeed be deprecated on August 12, 2024.
Encountering a 502 API error: A user raised concerns about receiving a 502 error while querying the Perplexity API.

Another user pointed to the status page indicating no recent issues reported.
Suggestion to test on Perplexity Labs: A suggestion was made for a user experiencing API issues to test their queries on the Perplexity Labs Playground.

This could provide a better understanding of how the API functions in a test environment.
Rounding off with appreciation: In response to the status update shared, a user expressed gratitude by saying 'you're the best'.

This reflected a positive engagement within the community.

Links mentioned:

Perplexity Labs: no description found
Supported Models: Perplexity Models Model Parameter Count Context Length Model Type llama-3-sonar-small-32k-online 8B 28,000 Chat Completion llama-3-sonar-small-32k-chat 8B 32,768 Chat Completion llama-3-sonar-large-32...
Perplexity - Status: Perplexity Status

Eleuther ▷ #announcements (1 messages):

Mechanistic Anomaly Detection Methods
Anomaly Detection Performance
Eliciting Latent Knowledge from Language Models
Detection of Adversarial Examples

Mechanistic Anomaly Detection methods underperform: The team explored mechanistic methods for detecting anomalous behavior in language models but found that these methods often do not outperform non-mechanistic baselines focused on activations.

Despite this, they achieved better performance when evaluating entire batches of test data rather than individual test points.
Strong Performance on Some Tasks: They observed strong anomaly detection results on many tasks, though not all tasks yielded equally good outcomes.

This indicates variability in performance across different tests, highlighting the importance of context in evaluation.
Eliciting Quirky Language Model Behavior: A recent blog post discusses the publication, Eliciting Latent Knowledge from Quirky Language Models, detailing their finetuning techniques on question and answer datasets for language models to behave in 'quirky' ways.

This involved training models to respond reliably when prompted as 'Alice:' and inconsistently when prompted as 'Bob:', addressing the Mechanistic Anomaly Detection (MAD) problem introduced by Paul Christiano.
Ease of Detecting Adversarial Examples: The team found that it is relatively easy to detect adversarial examples in image classifiers using off-the-shelf techniques.

However, they did not test whether their own anomaly detectors are robust against adversarial attacks.
Contributions Acknowledged: Thanks were given to key contributors for their efforts in the project.

Interested individuals were encouraged to check out a specific channel for opportunities to assist with ongoing work.

Links mentioned:

Mechanistic Anomaly Detection Research Update: Interim report on ongoing work on mechanistic anomaly detection
GitHub - EleutherAI/cupbearer at attribution_detector: A library for mechanistic anomaly detection. Contribute to EleutherAI/cupbearer development by creating an account on GitHub.

Eleuther ▷ #general (36 messages🔥):

Open Letter Against SB1047
Disputes on AI Safety Act
Anthropic's Response to SB1047
Philosophical Divide on AI Regulation
Research Practices in AI Accountability

Open Letter Against SB1047 Gains Support: A coalition of academics has organized an open letter opposing California's SB1047, citing concerns it may restrict research on large ML models and AI safety.

Signatures are encouraged from various stakeholders to demonstrate opposition from a non-industry perspective, aiming to protect open-source initiatives.
Debate Erupts Over Interpretations of SB1047: Members are sharply divided, with one side arguing the bill fosters accountability for mass casualties and the other asserting it stifles academic freedom and innovation.

Critics highlighted apparent contradictions between letters supporting and opposing the bill, with concerns over liability and restrictions on research freedom.
Anthropic's Insightful Take on SB1047: A member expressed appreciation for Anthropic's response, which is seen as sensible regarding concerns about SB1047.

This response is viewed as a constructive approach in the ongoing debate surrounding AI safety regulations.
Philosophical Views Clash Over AI Legislation: The discussions reflect deeper philosophical disagreements about the government's role, what constitutes a good society, and the future direction of AI technology.

Many participants note the difficulty in finding common ground when foundational beliefs about governance and technology diverge significantly.
Challenges in Regulating AI Techniques: The conversation included concerns that legislative mandates like universal watermarking could lead to premature and ineffective solutions for complex technical issues.

Stakeholders emphasized the necessity of focusing research efforts on practical solutions for accountability instead of relying on vague regulations.

Links mentioned:

DocumentCloud: no description found
DocumentCloud: no description found
Letter to YC & a16z | SB 1047 - Safe & Secure AI Innovation: no description found
Students, Faculty, and Scientists Against SB 1047 (AI Safety Act) Open Letter Signature Form: This is a form to provide your signature in support of our open letter from UC Faculty and students against California SB 1047, a catastrophically bad law attempting to regulate "AI safety" ...

Eleuther ▷ #research (40 messages🔥):

Distributed AI Training
Latent Space Search
In-Context Learning
Evaluation Function Challenges
Self-Taught Evaluation

Meta's Distributed AI Training Network: At ACM SIGCOMM 2024 in Sydney, Meta shared insights about the crucial role of AI networks in supporting distributed AI training workloads, particularly the infrastructure required for models like LLAMA 3.1 405B. Their paper, “RDMA over Ethernet for Distributed AI Training at Meta Scale” elaborates on the scale and design of their AI networks.

The discussion highlights the increased demands AI places on data center networking, with generative AI models notably stressing existing infrastructures.
Latent Space Search vs. Discrete Problem Solving: A debate arose about the efficiency of searching in latent space versus traditional discrete problem-solving methods, with one member suggesting that a VQ method could facilitate problem-solving by leveraging latent representations. They proposed that composable subsolutions could enhance the model’s learning of components.

Conversely, others cautioned that sampling independent solutions could limit feedback incorporation, potentially undermining effectiveness in complex problem scenarios.
In-Context Learning Discussion: An exploration of In-Context Learning (ICL) revealed concerns about using it in searches, emphasizing that the assessment of proposed solutions could create inefficiencies. One user highlighted that relying solely on discrete spaces might hinder optimal solution discovery compared to more integrated approaches.

The conversation underscored the trade-offs between differentiability and practicality, with suggestions for simpler architectures being considered.
Exploration of Evaluation Functions in AI: Concerns were raised about the complexities of constructing evaluation functions that are differentiable and effectively credit assignments. Discussions referenced historical contexts where simpler approaches yielded better outcomes than more complicated methods.

Participants noted the relevance of recent papers that analyzed scaling laws and the effectiveness of various sampling strategies in improving model performance.
Self-Taught Evaluator Method: A new approach to model-based evaluation was introduced that eliminates the reliance on human annotations by using synthetic training data. This method, termed Self-Taught Evaluator, reportedly enhances the performance of LLMs significantly, showing an improvement from 75.4 to 88.3 on certain benchmarks.

The iterative improvement scheme highlights the potential for advanced models to train evaluators through generated contrasting outputs rather than traditional preference judgments.

Links mentioned:

Large Language Monkeys: Scaling Inference Compute with Repeated Sampling: Scaling the amount of compute used to train language models has dramatically improved their capabilities. However, when it comes to inference, we often limit the amount of compute to only one attempt ...
An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models: The optimal training configurations of large language models (LLMs) with respect to model sizes and compute budgets have been extensively studied. But how to optimally configure LLMs during inference ...
Self-Taught Evaluators: Model-based evaluation is at the heart of successful model development -- as a reward model for training, and as a replacement for human evaluation. To train such evaluators, the standard approach is ...
Getting 50% (SoTA) on ARC-AGI with GPT-4o: You can just draw more samples
RoCE networks for distributed AI training at scale: AI networks play an important role in interconnecting tens of thousands of GPUs together, forming the foundational infrastructure for training, enabling large models with hundreds of billions of pa…

Eleuther ▷ #scaling-laws (4 messages):

Training Instability
Double Descent
Learning Rate Adjustment

Noise likely to blame for training instability: It's hypothesized that the issues observed are more likely due to noise and training instability rather than double descent.

This suggests that improvements in training techniques might mitigate the problems at hand.
Recommending multiple experimental runs: There’s a suggestion to conduct experiments 3 to 5 times and average the results before drawing conclusions.

This could lead to more reliable data and better insights into the training process.
Lower learning rate for stability: If problems persist, lowering the learning rate is advised to enhance training stability.

This method may help reduce fluctuations and improve the overall training process.
Exploring alternative solutions: Only after addressing the above points should other potential issues be considered.

This approach emphasizes a systematic troubleshooting process.

Eleuther ▷ #interpretability-general (5 messages):

SAE Developments
Transformer Circuits
SAELens Library
Scaling Monosemanticity
SAE Landscape Overview

New Starting Points for Understanding SAEs: Members discussed several starting points for understanding recent developments in SAEs, referencing foundational works including this paper and this superposition paper.

Good progress has been made and there are more resources for deeper study into SAEs.
Comprehensive Overview of SAE Landscape: An overview document of the SAE landscape was shared, which provides a rough context of the field and can be accessed here.

It was noted that this document may miss some of the latest developments but still serves as a good introduction.
Progress in Real-Scale SAEs: Current works on real-scale SAEs include contributions from multiple teams focusing on scaling from toy models to larger parameters, with specific papers linked for further exploration. For example, this breakthrough details methodological advancements in scaling SAEs.

The ongoing work includes discussions on integrating with larger models and improvements in training libraries, like SAE training library.
SAELens Library for Training and Analysis: SAELens has been highlighted as a library designed for training and analyzing SAEs, showcasing visualizations that enhance understanding of neuron behavior. Detailed functionality is documented, including links to projects associated with SAE training like auto-interp library.

Members are encouraged to join discussions in dedicated channels for more insights and collaboration on SAE tools.

Links mentioned:

SAE Landscape: SAE Landscape – A collection of useful publications and tools Welcome to a collection of resources on Sparse Autoencoders (SAEs) for language model interpretability. This is a live document, I appreci...
A Mathematical Framework for Transformer Circuits: no description found

Eleuther ▷ #lm-thunderdome (8 messages🔥):

lm-eval-harness usage
Huggingface model compatibility
Batch size in loglikelihood_rolling
Evaluation harness special tokens
Accessing benchmark names in JSON output

lm-eval-harness for custom model architecture: A member shared a self-contained example on GitHub to override some model methods of the Huggingface LM class, making it compatible with custom models.

This shows how to integrate the lm-eval-harness effectively.
Using PretrainedModel with HFLM: A community member confirmed that you can pass an already-initialized Huggingface PretrainedModel to the HFLM class, allowing for custom evaluations in a python script.

This process is often done when applying custom quantization or pruning prior to model evaluation.
loglikelihood_rolling behavior with batch size: A user inquired whether loglikelihood_rolling respects the batch size in the Huggingface model class, noting it seems to process requests one at a time.

This indicates a concern about the efficiency of the evaluation process.
Special tokens in evaluation harness: A member confirmed that evalharness adds special tokens, with add_special_tokens=True in the tokenizer by default.

However, they noticed the generated sample files did not show the BOS token, prompting further verification.
Extracting benchmark names from JSON output: A user sought advice on how to find the benchmark name from JSON output, proposing the idea of turning it into an ordered dictionary and selecting the first key.

Another member clarified that the results JSON has a 'results' key, which directly leads to benchmark names and their corresponding scores.

Link mentioned: mamba/evals/lm_harness_eval.py at main · state-spaces/mamba: Mamba SSM architecture. Contribute to state-spaces/mamba development by creating an account on GitHub.

LangChain AI ▷ #general (83 messages🔥🔥):

Ollama Memory Issues
LangGraph Courses
Mood2Music App Introduction
SQL Chat Agent Assistance
Automatic Code Reviews Challenges

Ollama Memory Issues: A user experienced out-of-memory errors while trying to run models like aya and nomic-embed-text on an 8GB GPU, despite having 32GB of RAM.

The recommended solution was to set the parameter num_gpu = 0, effectively switching the application to run on CPU only.
LangGraph Course Recommendations: Several users discussed available courses for learning LangGraph, with a suggestion to check out a course offered by DeepLearning.ai.

An 'advanced' course on Udemy was also mentioned as a potential resource, though it was noted that basic courses may be more suitable for newcomers.
Introduction to Mood2Music App: A user introduced their app, Mood2Music, which allows users to discover music based on their current mood by connecting to services like Spotify.

The app features AI-powered recommendations and includes a waitlist for interested users to join.
SQL Chat Agent Collaboration: A user sought help with their script for a SQL chat agent, prompting another user to offer assistance based on their similar project experience.

Direct messaging for script review was established as users began to collaborate on solutions.
Challenges in Automatic Code Reviews: A user raised concerns about difficulties in automatic code reviews using GPT-4o, specifically in correctly identifying positions for comments in GitHub diffs.

Another user suggested that using a coding-centric model and structuring the data ingestion and retrieval might provide better results.

Links mentioned:

mood2music: no description found
Build a Chatbot | 🦜️🔗 Langchain: Overview
Vector DB Comparison: Vector DB Comparison is a free and open source tool from VectorHub to compare vector databases.
AI Agents in LangGraph: Build agentic AI workflows using LangChain's LangGraph and Tavily's agentic search. Learn directly from LangChain and Tavily founders.
Can Ollama use both CPU and GPU for inference? · Issue #3509 · ollama/ollama: What are you trying to do? May I know whether ollama support to mix CPU and GPU together for running on windows? I know my hardware is not enough for ollama, but I still want to use the part abilit...

LangChain AI ▷ #share-your-work (2 messages):

Agentgenesis
AI code snippets
Open source contributions

Agentgenesis Launches for AI Developers: A member presented Agentgenesis, an AI component library offering copy-paste code snippets designed to enhance development efficiency for Gen AI applications, boasting a potential 10x boost.

The project is fully open-sourced under MIT license and aims to attract active contributors, with links to the official site and the GitHub repo.
Request for Code Implementation: Another member inquired about sharing code related to Johnny's project, expressing interest in the implementation details.

This discussion highlights the community’s eagerness to collaborate and share resources on innovative projects.

Links mentioned:

AgentGenesis: Copy paste the most trending AI agents and use them in your project without having to write everything from scratch.
GitHub - DeadmanAbir/AgentGenesis: Welcome to AgentGenesis, your source for customizable Gen AI code snippets that you can easily copy and paste into your applications.: Welcome to AgentGenesis, your source for customizable Gen AI code snippets that you can easily copy and paste into your applications. - DeadmanAbir/AgentGenesis

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

GPT-4o-2024-08-06 Release
Structured Outputs Issues

GPT-4o-2024-08-06 is Now Live!: The new model GPT-4o-2024-08-06 has been officially released and is available for use at OpenRouter. This release adds to the current lineup of models offered by OpenRouter.
Issues with Structured Outputs: A note was made that structured outputs with strict mode are not fully supported at this time. Users are encouraged to report issues in designated threads: <#1138521849106546791> or <#1107397803266818229>.

Link mentioned: GPT-4o (2024-08-06) - API, Providers, Stats: The 2024-08-06 version of GPT-4o offers improved performance in structured outputs, with the ability to supply a JSON schema in the respone_format. Read more [here](https://openai. Run GPT-4o (2024-08...

OpenRouter (Alex Atallah) ▷ #general (62 messages🔥🔥):

Gemini Pro 1.5 Performance
New Pricing for Google Gemini
OpenRouter API Access
Structured Outputs in GPT Models
Model Misconfigurations and Limitations

Gemini Pro 1.5 Encountering Resource Exhaustion: Users reported running into a 'Resource has been exhausted' error with Gemini Pro 1.5, attributed to Google's rate limiting rather than user misconfiguration.

One user confirmed that Google's heavy rate limits on this model led to these issues.
Significant Price Drops for Google Gemini: On the 12th, the price for Google Gemini 1.5 flash will be cut in half, making it cheaper than both yi-vision and firellava.

This pricing update generated excitement, as a user noted that these cost reductions could allow for extensive captioning for user-generated content (UGC).
OpenRouter API Usability Explained: To use the OpenRouter API, users must obtain an API key from their profile and use it in compatible user interfaces like Lobe Chat.

This setup allows new users to engage more easily with models through simplified interfaces.
New Structured Outputs Feature Introduced: A new feature in GPT-4o allows structured outputs with improved token usage, providing a 50% reduction in input costs and a 33% reduction in output costs compared to previous versions.

Discussions highlighted the importance of ensuring responses return valid JSON and the potential enhancements this functionality offers.
Confusion Over Model Capabilities: There was confusion regarding the GPT-4o-2024-08-06 model's output limits, with a user noting that OpenRouter's current display showed a maximum of only 4,096 tokens, while the official OpenAI documentation stated 16,384 tokens.

Alex Atallah confirmed updates to align OpenRouter's information with OpenAI's documentation.

Links mentioned:

no title found: no description found
OpenAI: Introducing Structured Outputs in the API: OpenAI have offered structured outputs for a while now: you could specify "response_format": {"type": "json_object" }} to request a valid JSON object, or you could use t...
Responses | OpenRouter: Manage responses from models
Anthropic Status: no description found
GPT-4o (2024-08-06) - API, Providers, Stats: The 2024-08-06 version of GPT-4o offers improved performance in structured outputs, with the ability to supply a JSON schema in the respone_format. Read more [here](https://openai. Run GPT-4o (2024-08...

LlamaIndex ▷ #announcements (1 messages):

CodiumAI Webinar
RAG-augmented coding assistants
Context-aware code generation

Join CodiumAI for a Webinar on RAG: Reminder: We have a webinar with CodiumAI discussing RAG-augmented coding assistants, central to achieving contextual awareness in AI-generated code.

Participants will need to verify token ownership with their wallet to join the event.
RAG's Role in Code Quality: Retrieval-Augmented Generation (RAG) is crucial for enterprises to maintain high code quality and integrity in AI-generated code.

The webinar will showcase advanced approaches and practical applications built on the LlamaIndex infrastructure.

Link mentioned: LlamaIndex Webinar: Using RAG with LlamaIndex for Large-Scale Generative Coding · Zoom · Luma: Retrieval-Augmented Generation (RAG) plays a central role in achieving contextual awareness in AI-generated code, which is crucial for enterprises adopting…

LlamaIndex ▷ #blog (4 messages):

Local Multi-Agent System
Second RAG-a-thon
LlamaIndex Workflows
Documentation for llama-agents

Build a Local Multi-Agent System with RabbitMQ: A blog by @pavan_mantha1 details building a local multi-agent system using @RabbitMQ to facilitate communication between agents with tools like @ollama and @qdrant_engine. Check it out here for a complete setup guide.

This entire setup is made easier with llama-agents, their main tool for agent development.
Join the Second RAG-a-thon!: LlamaIndex is hosting their second RAG-a-thon in collaboration with @pinecone and @arizeai at the @500GlobalVC offices in Palo Alto from October 11-13. For more info, visit this link.

Participants will have an entire weekend to engage in the hackathon and explore innovative ideas.
Discover LlamaIndex Workflows: In a new YouTube video, @seldo explains Workflows, a feature for constructing complex agent applications within LlamaIndex. It covers creating, running, and visualizing workflows along with details on structure and state management.

View the video for insights on looping and branching within workflows here.
Comprehensive Documentation for llama-agents: A new primer has been released to enhance understanding of llama-agents, which has advanced significantly recently. Check the detailed documentation here to guide users on building multi-agents as a service.

@nerdai has played a key role in this updated resource to better support users.

LlamaIndex ▷ #general (49 messages🔥):

HuggingFace Inference API for embeddings
LlamaParse Arabic language parsing
SimpleDirectoryReader PDF loading behaviors
Vector DB comparison sharing
LlamaIndex function calling issue

HuggingFace API for Generating Embeddings: A member inquired about using the HuggingFace Inference API for generating embeddings with a private endpoint, and another suggested checking this example as a solution.

It included a code snippet for setting up the TextEmbeddingsInference model.
LlamaParse struggles with Arabic parsing: Concerns were raised about LlamaParse's Arabic parsing returning data in a Left to Right format instead of Right to Left, indicating a potential gap in handling Arabic intricacies.

Members sought clarity on whether LlamaParse addresses this right-to-left formatting issue.
Concerns on SimpleDirectoryReader PDF Loading: A member questioned why SimpleDirectoryReader loads PDFs as individual documents per page and if it's possible to consolidate them into a single document.

Another shared modification methods for the PDFReader to achieve a single document load.
Vector DB Comparison Resource Shared: A resource link Vector DB Comparison was shared in the discussion, deemed very helpful by users.

Members suggested creating a space for sharing various experiences with different Vector DBs.
Function Calling Issue in LlamaIndex: Members discussed a TypeError encountered with a specific line of code related to BedrockConverse, and one proposed upgrading the package to resolve the issue.

They observed that updating the package improved functionality, prompting discussions on properly managing package requirements.

Links mentioned:

Vector DB Comparison: Vector DB Comparison is a free and open source tool from VectorHub to compare vector databases.
llama_index/llama-index-core/llama_index/core/llms/function_calling.py at 15227173b8c1241c9fbc761342a2344cd90c6593 · run-llama/llama_index: LlamaIndex is a data framework for your LLM applications - run-llama/llama_index
GitHub - run-llama/llama_index at 15227173b8c1241c9fbc761342a2344cd90c6593: LlamaIndex is a data framework for your LLM applications - GitHub - run-llama/llama_index at 15227173b8c1241c9fbc761342a2344cd90c6593
Text Embedding Inference - LlamaIndex: no description found
llama_index/pyproject.toml at 6eea66ed23fb85ee77664148a4c2b66720caabeb · run-llama/llama_index: LlamaIndex is a data framework for your LLM applications - run-llama/llama_index

Cohere ▷ #discussions (29 messages🔥):

Hallucination Index
Command R Plus licensing debate
Open source vs Open weights
Mistral models

Hallucination Index Sparks Debate: The new Hallucination Index evaluates 22 leading LLM models, highlighting the challenges of hallucinations amid increasing model sizes and capabilities.

Members expressed skepticism towards the index, particularly regarding its accuracy and the definition of open-source.
Command R Plus Licensing Confusion: The definition of open source regarding Command R Plus was questioned, with members discussing how Creative Commons Attribution Non Commercial 4.0 license impacts its classification.

Some argued that since the model's weights are not free for commercial use, it should be considered closed source, prompting debates about proper definitions.
Open Weights vs Open Source: Discussion centered around the distinction between open weights and fully open-source models, with some suggesting that open weights should have its own category.

Members acknowledged that while some models have open weights, they may still be under restrictions that prevent commercial use.
Mistral Models Open Source Status: A member pointed out that Mistral is licensed under Apache 2.0, suggesting it maintains open-source status despite the common perception that AI models lack true openness.

Participants noted Mistral's commitment to open weights, yet debated whether the data used for training these models is truly open.

Links mentioned:

Apache 2.0 models | Mistral AI Large Language Models: We open-source both pre-trained models and instruction-tuned models. These models are not tuned for safety as we want to empower users to test and refine moderation based on their use cases. For safer...
LLM Hallucination Index - Galileo: LLM Hallucination Index. A Ranking & Evaluation Framework For LLM Hallucinations

Cohere ▷ #questions (3 messages):

Contacting Dennis Padilla
Lauren's absence

Need Dennis Padilla's Email: A member reached out in search of Dennis Padilla's email after being directed to contact him due to Lauren's vacation.

They expressed their frustration, stating 'I can't find it anywhere.'
Clarification on the Request: Another member inquired about the context of the email request, asking, 'Hey hey, what's this with regards to?'

This response indicates a willingness to assist but requires more information about the situation.

Cohere ▷ #cohere-toolkit (1 messages):

Cohere Toolkit
LLM with RAG
3rd Party API integration
Command Models

Cohere Toolkit used for AI fellowship project: The Cohere team is utilizing the Cohere Toolkit for a learning project as part of an AI fellowship to build a LLM with RAG over a Confluence knowledge base.

This project incorporates various types of knowledge, including recipes, cooking notes, and legal case notes.
Switching models to 3rd Party APIs: A member inquired if anyone has successfully changed the model deployed from a Command Model or other Cohere models to a third-party API-based model like OpenAI Chat GPT or Gemini 1.5.

They also mentioned the potential of integrating models available via the Groq API.

Modular (Mojo 🔥) ▷ #mojo (30 messages🔥):

InlineList functionality
Small buffer optimization in Lists
Using custom accelerators with Mojo
Integration of CXL with FPGA
Future of compiler support for RISC-V

InlineList lacks moveinit and copyinit: A member inquired about the absence of __moveinit__ and __copyinit__ in InlineList, to which it was noted that significant work is still being done to enhance its functionality.

Progress is being made, with important work merged recently.
Small Buffer Optimization for Lists Introduced: A member highlighted the enhancement of List with optional small buffer optimization, referencing this pull request.

Another member confirmed that this allows for stack-allocated slots.
Custom Accelerators Compatibility with Mojo: A user asked about the use of custom accelerators like PCIe cards with Mojo, with the response indicating that integration would be transparent on the host side but limited until Mojo is open source.

The ongoing development highlights potential challenges with using systolic arrays before open sourcing.
CXL Integration on FPGA Devices: Discussion arose regarding integrating cxl.mem on FPGA devices, specifically asking about design compatibility with Intel's CXL IP blocks.

Users confirmed they are using a Xilinx VU13P FPGA, indicating interest in hardware capabilities related to CXL.
Future Support for RISC-V in Mojo: A user expressed optimism about contributing RISC-V support to Mojo once it becomes open source, while indicating current reliance on lower-level PyTorch IR transformations.

Another member noted that while Mojo could benefit their use case in the future, it is currently not ready for such applications.

Links mentioned:

modula - Overview: GitHub is where modula builds software.
[stdlib] Add optional small buffer optimization to List, take 2 by gabrieldemarmiesse · Pull Request #2825 · modularml/mojo: This PR solves part of #2467 This PR is part of three PRs to read and merge in the following order [stdlib] Add optional small buffer optimization to List, take 2 #2825 [stdlib] Work around the ma...

LAION ▷ #general (18 messages🔥):

John Schulman's move to Anthropic
Open-source AI challenges
Meta's JASCO disappearance
Nullbulge dox controversy
School BUD-E voice assistant

John Schulman leaves OpenAI for Anthropic: OpenAI co-founder John Schulman announced via a Monday X post that he would be leaving OpenAI to join Anthropic, an AI startup supported by Amazon.

This transition follows OpenAI's disbandment of their superalignment team just three months prior, which aimed to ensure the controllability of advanced AI systems.
Open-source AI training costs challenge development: A member noted that the high costs of training state-of-the-art AI models hinder progress in the open-source community due to reliance on unlicensed data.

They suggested that if training were more affordable, many more open models would disregard concerns about ethical data sourcing.
Meta's JASCO reportedly missing: Discussion arose around Meta's JASCO being absent, speculating that the Udio and Suno lawsuits might have influenced this situation.

The conversation highlighted concerns about the impact of legal issues on significant AI projects.
Nullbulge doxing raises concerns: Rumors surfaced that Nullbulge has been doxxed, with remarks on his poor operational security raising potential future issues.

A member cautioned against searching for him on the internet, emphasizing the nature of the content.
Introduction of School BUD-E voice assistant: A YouTube video was shared about a project titled School BUD-E, showcasing a web-browser voice assistant.

The description of the video was noted as undefined, prompting curiosity about the project itself.

Links mentioned:

OpenAI co-founder John Schulman says he will leave and join rival Anthropic: Schulman said OpenAI executives remain committed to backing efforts to ensure that people can control highly capable artificial intelligence models.
Trio of Leaders Leave OpenAI — The Information: no description found
School BUD-E web-browser Voice Assistant: no description found

LAION ▷ #research (8 messages🔥):

273k Model Experimentation
CIFAR Images in FTT Analysis

Scaling Efforts with 270k Params: A user expressed optimism after reaching 84% validation accuracy and shared a moment of encouragement with a gif link, signaling a breakthrough in their experimentation.

However, it was noted that the 270k model seems to hit nearly the same accuracy limit as the smaller models, indicating a potential ceiling in performance.
Questions on CIFAR Image Representation in FTT: A member inquired about the appearance of CIFAR images in FTT, wondering if the frequency information is primarily similar while phase data varies.

This discussion highlights an interest in understanding the underlying characteristics of images processed through Fourier Transform Techniques.

Link mentioned: The Matrix Laurence Fishburne GIF - The matrix Laurence fishburne Morpheus - Discover & Share GIFs: Click to view the GIF

tinygrad (George Hotz) ▷ #general (8 messages🔥):

Tinygrad on Aurora
XMX Support
FP8 NVIDIA Bounty

Feasibility of Running Tinygrad on Aurora: There was a discussion on whether running tinygrad on the Aurora supercomputer at Argonne National Laboratory is feasible, given its Intel GPUs and limitations.

It is expected that after optimizing its performance it will exceed 2 ExaFLOPS, making it a powerful but challenging environment.
XMX Support Discussion: A member mentioned that there seems to be some work on XMX support for tinygrad, with speculation that OpenCL might work, albeit slowly.

They noted that the Intel GPUs used in Aurora are Max Data Center GPUs, supporting tensor core instructions.
Need for Distributed Computing Functionality: There was a suggestion to implement more mature functionality for distributed computing to ensure compatibility with tinygrad's demands.

This could be crucial for leveraging the full capabilities of the Aurora supercomputer.
FP8 NVIDIA Bounty Preferences: In reference to a bounty regarding FP8 NVIDIA support, it was confirmed that both E4M3 and E5M2 formats will be needed.

This clarity will help in addressing the requirements of the bounty effectively.

Links mentioned:

cl_intel_subgroup_matrix_multiply_accumulate: no description found
Aurora (supercomputer) - Wikipedia: no description found

tinygrad (George Hotz) ▷ #learn-tinygrad (16 messages🔥):

Contiguous Buffers
JIT and Batch Sizes
Computer Algebra Study Notes
CLANG and LLVM Threading

Handling Contiguous Buffers: Discussions around an AssertionError indicated that ensuring the buffer is contiguous can resolve issues with assignments in Tinygrad, as suggested by George Hotz.

One user found a solution through testing, affirming that the problem was rectified.
JITting with Uneven Batches: For batch sizes that don’t subdivide datasets perfectly, a member asked about handling JIT errors, and George suggested avoiding JIT on the last batch or skipping it.

A firm example was provided on how to implement this solution by calling the JIT function on all but the last batch.
Introduction to Computer Algebra: A user shared their study notes on computer algebra, linking to a GitHub repository to provide theoretical background for symbolic math.

The shared notes aim to fortify understanding following experience with shapetracker and related topics.
CLANG and LLVM Use of Threads: A question was raised about CLANG and LLVM using only one thread, to which a user confirmed this fact.

Options for enhancing these capabilities with OpenMP were also discussed, including potential code pull requests on GitHub.

Link mentioned: computer-algebra-study-notes/README.md at main · mesozoic-egg/computer-algebra-study-notes: Contribute to mesozoic-egg/computer-algebra-study-notes development by creating an account on GitHub.

DSPy ▷ #show-and-tell (6 messages):

Wiseflow
HybridAGI
Dynamic Knowledge Base

Wiseflow: A Tool for Information Mining: The Wiseflow project has been introduced as an agile information mining tool that extracts concise messages from various online sources including WeChat official accounts and social platforms.

It allows users to automatically categorize and upload data to a database, making information management more efficient.
Combining Wiseflow with Dynamic Knowledge Base: A suggestion was made to combine Golden Ret with Wiseflow to enhance its capabilities towards a more dynamic knowledge base.

This leads to an interesting conversation about building practical applications from such integrations.
HybridAGI Launches New Version: The DSPy community has announced the release of a new version of HybridAGI, a neuro-symbolic cypher-based system focused on graph-program synthesis.

The update includes multiple notebooks aimed at optimizing usability and data processing pipelines, promising an easier way to integrate DSPy and Knowledge Graphs.
Building Interesting Tools: In a light-hearted exchange, a member expressed skepticism about sharing innovative tools without compensation, highlighting the ongoing search for unique projects within the community.

This reflects the dynamic flow of ideas and the competitive spirit among members to develop and showcase their technological gems.

Links mentioned:

GitHub - SynaLinks/HybridAGI: The Programmable Cypher-based Neuro-Symbolic AGI that lets you program its behavior using Graph-based Prompt Programming: for people who want AI to behave as expected: The Programmable Cypher-based Neuro-Symbolic AGI that lets you program its behavior using Graph-based Prompt Programming: for people who want AI to behave as expected - SynaLinks/HybridAGI
GitHub - TeamWiseFlow/wiseflow: Wiseflow is an agile information mining tool that extracts concise messages from various sources such as websites, WeChat official accounts, social platforms, etc. It automatically categorizes and uploads them to the database.: Wiseflow is an agile information mining tool that extracts concise messages from various sources such as websites, WeChat official accounts, social platforms, etc. It automatically categorizes and ...

DSPy ▷ #papers (2 messages):

Language Models in Software Engineering
Inference Compute Scaling
LLM-based Agents
DeepSeek Performance

Exploring LLMs in Software Engineering: Researchers are examining the applications of large language models (LLMs) in areas like software engineering, especially in code generation and vulnerability detection. A study highlights the lack of a clear distinction between LLMs and LLM-based agents, stressing the need for unified standards and benchmarking.

Current efforts are still in early stages, making it unclear how to classify an LLM as an LLM-based agent in its respective domain.
Scaling Inference Compute Reveals Insights: A recent study indicates that by increasing the number of inference samples, coverage—the fraction of problems solved—improves significantly, scaling across multiple tasks and models. For instance, using DeepSeek-V2-Coder-Instruct, success rates surged from 15.9% with one sample to 56% with 250 samples.

This approach provides a substantial performance boost in domains like coding and formal proofs, where all answers can be automatically verified, surpassing the previous state-of-the-art of 43%.

Links mentioned:

Large Language Monkeys: Scaling Inference Compute with Repeated Sampling: Scaling the amount of compute used to train language models has dramatically improved their capabilities. However, when it comes to inference, we often limit the amount of compute to only one attempt ...
From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future: With the rise of large language models (LLMs), researchers are increasingly exploring their applications in var ious vertical domains, such as software engineering. LLMs have achieved remarkable succe...

DSPy ▷ #general (7 messages):

MIPRO vs BootstrapFewShotWithRandomSearch
MIPROv2 assertions
Complexity in approaches

MIPRO often outperforms BootstrapFewShotWithRandomSearch: A member posed the question of whether MIPRO always performs better than BootstrapFewShotWithRandomSearch; another member noted that it does so often, but not necessarily.

This suggests a context-dependent performance that may vary based on implementation or dataset.
MIPROv2 does not yet support assertions: When asked if MIPROv2 supports assertions, the response was a clear, not yet.

This indicates a potential future enhancement that users are looking forward to.
Start simple when experimenting with models: A member provided guidance that it’s best to always start simple, suggesting to begin with random search before progressing to MIPRO.

This advice highlights a strategy to gradually increase complexity over time, which can benefit the learning process.

DSPy ▷ #colbert (1 messages):
gamris: Would you recommend FastEmbed by Qdrant instead? https://github.com/qdrant/fastembed

OpenAccess AI Collective (axolotl) ▷ #general (7 messages):

Synthetic Data Generation
SQL Examples with Llama Index
Lora Adapter MD5 Consistency
Bitsandbytes Multi Backend Refactor

Exploring Synthetic Data Generation Strategies: A member inquired about effective synthetic data generation strategies to enhance 8 billion parameter models in reasoning tasks like text to SQL.

It was suggested that utilizing a Chain of Thought (CoT) in synthetic instructions prior to generating SQL could potentially improve performance.
Llama Index Provides SQL Examples: A member acknowledged that the Llama Index contains some SQL examples, potentially useful for the synthetic data discussion.

These examples may serve as a reference point for developing and testing strategies in synthetic data generation.
Understanding Lora Adapter MD5 Consistency: A discussion arose about whether the MD5 checksum of a Lora adapter merge should remain consistent across multiple merges.

A member confirmed that if the MD5 differs, it indicates something is wrong with the merge process.
Tracking Bitsandbytes Multi Backend Refactor: A member shared a link to a GitHub pull request that is central to the multi backend refactor of bitsandbytes.

This pull request aims to maintain clarity on all changes introduced in the ongoing multi-backend refactor process.

Link mentioned: (WIP) Multi backend refactor -> main (full diff of all already merged PRs) by Titus-von-Koeller · Pull Request #1220 · bitsandbytes-foundation/bitsandbytes: This PR to main serves the purpose to keep an overview of all the extensive changes that have been introduced to multi-backend-refactor to the iterative PRs around this topic. We will eventually me...

OpenAccess AI Collective (axolotl) ▷ #axolotl-dev (5 messages):

Gemma 2 27B QLoRA tweaks
L40S GPU training performance
Astral UV package

Tweaking QLoRA for Gemma 2 27B: There are discussions about using this QLoRA for Gemma 2 27B, noting that adjustments to the learning rate may be necessary for optimal performance with the latest Flash Attention.

Thanks! was expressed by another member, indicating willingness to experiment with the setup.
Training Models on L40S GPUs: A member inquired about the performance of training or serving models on L40S GPUs, seeking insights from others.

Another participant stated that training on L40S yields pretty decent results.
Astral UV for Faster Pip Usage: A member shared a link to the Astral UV GitHub repository, highlighting it as an extremely fast Python package installer and resolver written in Rust.

They noted that faster pip might be quite useful for Docker building.

Link mentioned: GitHub - astral-sh/uv: An extremely fast Python package installer and resolver, written in Rust.: An extremely fast Python package installer and resolver, written in Rust. - astral-sh/uv

OpenAccess AI Collective (axolotl) ▷ #general-help (3 messages):

Context Length Adjustment
RoPE Scaling for Fine-Tuning

Context Length Adjustments Made Easy: A member inquired whether it's possible to adjust the context length of a fine-tuned model like llama2-13b-hf during fine-tuning with a 4k context.

Another member confirmed that you can increase or decrease it as desired, but if increasing it significantly, it's best done stepwise for optimal performance.
RoPE Scaling: The Quick Fix for Context Issues: For those looking for a quick fix when adjusting context length, RoPE scaling was suggested as a viable solution.

The conversation highlighted that while adjustments can be made, achieving good performance may require careful incremental changes.
Editing Unique Samples Remains Unclear: A member expressed curiosity about the lack of clarity in editing unique samples, especially since past tools required Python for such tasks.

This indicates a gap in understanding or documentation around the editing process for unique samples in the current tools.

OpenAccess AI Collective (axolotl) ▷ #announcements (1 messages):
caseus_: Office hours kicks off in an hour in <#1268285745555308649>.

Torchtune ▷ #announcements (1 messages):

PPO training recipe
Qwen2 support
Feature requests

PPO Training Recipe Now Available!: A new end-to-end PPO training recipe has been added to Torchtune, enabling effective Reinforcement Learning from Human Feedback (RLHF). Check the implementation here.

This addition allows users to leverage the PPO paradigm for enhanced model training.
Qwen2 Models Supported in Recipes: Support for Qwen2 models has been integrated into training recipes, starting with a 7B version now available at this link. Upcoming releases will include 1.5B and 0.5B models soon.

This expansion allows developers to experiment with Qwen2 in their projects, enhancing model capabilities.
Seeking Feature Requests from the Community: The team is inviting feedback and suggestions for more models or recipes users would like to see in Torchtune. Members can submit their feature requests here.

User input is encouraged to help shape the future development of Torchtune.

Torchtune ▷ #general (9 messages🔥):

Llama 3 Model Support
Model Downloading Issues
Prompt Formatting with Llama 3
Llama 3 Instruct vs. Base Models

Support Plans for Llama 3 DPO: There was a query regarding if there are plans to support DPO for llama3-8B-full-finetune.

Community responses indicate that you can use any model with the recipes by ensuring the correct paths for the checkpointer and tokenizer are set.
Download Confusions with Llama 3: One user reported issues where the results seemed like they were using a BASE model instead of the INSTRUCT model despite having downloaded the correct version.

Another member suggested ensuring the prompt is formatted with the correct Llama 3 instruct template to avoid these issues.
Prompt Formatting Automation: A community member confirmed that formatting prompts with the Llama 3 instruct template is handled automatically by the tokenizer.

This suggests that there should be minimal manual intervention needed for correct prompt formatting.

Torchtune ▷ #dev (6 messages):

Model Index Page
Refactored PreferenceDataset

Proposing a Model Index Page: A member suggested creating an entire page dedicated to each model's builders, especially with the impending introduction of multimodal LLMs.

This would allow for a centralized index page to explain repetitive information like downloading and configuring models.
Refactored PreferenceDataset Supporting Chat: A member shared a link to a GitHub pull request which refactors the PreferenceDataset to support chat functionality.

The refactor follows the unified message transformation pipeline as outlined in RFC #1186, and the member requested additional feedback on this update.

Link mentioned: [4/n] Refactor preference dataset with transforms design by RdoubleA · Pull Request #1276 · pytorch/torchtune: Context Following the RFC in #1186, we will use the unified message_transform -> template -> tokenization data pipeline in all our datasets. This PR updates PreferenceDataset to follow t...

OpenInterpreter ▷ #general (9 messages🔥):

Open Interpreter Setup
Open Interpreter Security
Python Compatibility
Error Resolution
Open Source Vision Models

Trouble setting up Open Interpreter with local LLM: Users reported issues setting up Open Interpreter with a local LLM, particularly involving a loop of downloading the same model and encountering an openai.APIConnectionError. One participant expressed frustration after attempting to type 'Hello.' without success.
Seeking info on Open Interpreter's security measures: A user inquired about Open Interpreter's privacy protocols, specifically regarding data leaving the local computer, third-party involvement, and encryption standards.
Python Compatibility Query: A user questioned whether Open Interpreter works with Python 3.12, wondering if they could install Python from the Microsoft App Store.
User Interaction on Error Fixing: Users shared their experiences with the same error and discussed potential fixes, with one offering to collaborate and troubleshoot together via direct messaging.
Inquiry About Recommended Open Source Vision Models: A user inquired about which open-source model is recommended for vision tasks, reflecting an interest in applying Open Interpreter for vision-related purposes.

OpenInterpreter ▷ #O1 (2 messages):

Ollama Model List
Deepgram Support

Check Model Names with Ollama: A member advised others to use ollama list to display available model names since each requires varying amounts of VRAM on graphics cards.

They stressed the importance of following instructions from the Ollama documentation to ensure proper setup.
API Key Needed for Paid Models: It was pointed out that using paid remotely hosted models requires an API key for access.

Additionally, local models will run on a specific port based on their configurations.
Inquiry About Deepgram Support: A member inquired whether Deepgram is supported, indicating interest in its integration.

No detailed response regarding this support was noted in the messages.

Link mentioned: open-interpreter/docs/language-models/local-models/ollama.mdx at main · OpenInterpreter/open-interpreter: A natural language interface for computers. Contribute to OpenInterpreter/open-interpreter development by creating an account on GitHub.

Mozilla AI ▷ #announcements (2 messages):

Llamafile project updates
Community survey for feedback
sqlite-vec release party
Machine Learning Paper Talks
AMA with Local AI maintainer

Llamafile Continues to Impress: The core maintainer of Llamafile is making epic progress, focusing on offline, accessible LLMs in a single file.

This project is noted for its potential impact on ease of access to powerful models.
Community Feedback Opportunity: Members are invited to share how the Mozilla AI community can assist them through a survey, with a chance to win a $25 gift card.

This initiative encourages input on resources available within the community.
Join the sqlite-vec Release Party: An invitation to the sqlite-vec release party has been shared, allowing discussions about features and demos with the core maintainer.

Attendees can engage and explore what sqlite-vec offers to enhance their projects.
Machine Learning Paper Talks Scheduled: Upcoming Machine Learning Paper Talks will discuss Communicative Agents and Extended Mind Transformers.

These talks provide insights into recent advancements in machine learning with expert hosts.
Local AI AMA on Self-Hosting Solutions: An AMA featuring the core maintainer of Local AI will offer insights into self-hosting an open source alternative to OpenAI.

This session promises to clarify many aspects of using and setting up Local AI for various applications.

Link mentioned: Discover Typeform, where forms = fun): Create a beautiful, interactive form in minutes with no code. Get started for free.

MLOps @Chipro ▷ #events (1 messages):

LinkedIn Engineering
ML Platform Transformation

LinkedIn Engineering's ML Platform Transformation: LinkedIn Engineering shared insights on how they have transformed their ML platform, focusing on improved workflows and efficiency during a live session.

For more details, check out the event here.
Active Engagement in Live Events: The event on LinkedIn's engineering transformation attracted significant participation, highlighting community interest in ML advancements.

Participants engaged in discussions and posed questions throughout the session, showcasing the interactive nature of the event.

Don't miss what's next. Subscribe to AI News (MOVED TO news.smol.ai!):