[AINews] Kolmogorov-Arnold Networks: MLP killers or just spicy MLPs?

tokenization issues

                May 7, 2024

            [AINews] Kolmogorov-Arnold Networks: MLP killers or just spicy MLPs?

This is AI News! an MVP of a service that goes thru all AI discords/Twitters/reddits and summarizes what people are talking about, so that you can keep up without the fatigue. Signing up here opts you in to the real thing when we launch it 🔜

            Are Learnable Activations all you need?

AI News for 5/6/2024-5/7/2024. We checked 7 subreddits and 373 Twitters and 28 Discords (419 channels, and 3749 messages) for you. Estimated reading time saved (at 200wpm): 414 minutes.

Theory papers are usually above our paygrade, but that is enough drama and not enough else going on today that we have the space to write about it. A week ago, Max Tegmark's grad student Ziming Liu published his very well written paper on KANs (complete with fully documented library), claiming them as almost universally equal to or superior to MLPs on many important dimensions like interpretability/inductive bias injection, function approximation accuracy and scaling (though is acknowledged to be currently 10x slower to train on current hardware on a same-param count basis, it is also 100x more param efficient).

While MLPs have fixed activation functions on nodes ("neurons"), KANs have learnable activation functions on edges ("weights").

Instead of layering preset activations like ReLu, KANs model "learnable activation functions" using B-splines (aka no linear weights, just curves) and simple addition. People got excited, rewriting GPTs with KANs.
One week on, it now turns out that you can rearrange the KAN terms to arrive back at MLPs with the ~same number of params (twitter):

It doesn't surprise that you can rewrite one universal approximator as another - but following this very simple publication, many are defending KANs as more interpretable... which is also being rightfully challenged.
Have we seen the full rise and fall of a new theory paper in a single week? Is this the preprint system working?

Table of Contents

AI Twitter Recap
AI Reddit Recap
AI Discord Recap
PART 1: High level Discord summaries
Unsloth AI (Daniel Han) Discord
Nous Research AI Discord
Stability.ai (Stable Diffusion) Discord
LM Studio Discord
HuggingFace Discord
Perplexity AI Discord
CUDA MODE Discord
OpenAI Discord
Eleuther Discord
Modular (Mojo 🔥) Discord
OpenRouter (Alex Atallah) Discord
OpenInterpreter Discord
OpenAccess AI Collective (axolotl) Discord
LangChain AI Discord
LAION Discord
LlamaIndex Discord
tinygrad (George Hotz) Discord
Cohere Discord
Latent Space Discord
AI Stack Devs (Yoko Li) Discord
Mozilla AI Discord
Interconnects (Nathan Lambert) Discord
DiscoResearch Discord
LLM Perf Enthusiasts AI Discord
Alignment Lab AI Discord
Datasette - LLM (@SimonW) Discord

PART 2: Detailed by-Channel summaries and links

AI Twitter Recap

all recaps done by Claude 3 Opus, best of 4 runs. We are working on clustering and flow engineering with Haiku.

OpenAI and GPT Models

Potential GPT-5 Release: @bindureddy noted that the gpt-2 chatbots are back on chat.lmsys and may be the latest GPT-5 versions, though they seem underwhelming compared to the hype. @nptacek tested the im-a-good-gpt2-chatbot model, finding it very strong and definitely better than the latest GPT-4, while the im-also-a-good-gpt2-chatbot had fast output but tended to fall into repetitive loops.
OpenAI Safety Testing: @zacharynado speculated that OpenAI's "safety testing" for GPT-4.5 couldn't finish in time for a Google I/O launch like they did with GPT-4.
Detecting AI-Generated Images: OpenAI adopted the C2PA metadata standard for certifying the origin of AI-generated images and videos, which is integrated into products like DALL-E 3. @rohanpaul_ai noted the classifier can identify ~98% of DALL-E 3 images while incorrectly flagging <0.5% of non-AI images, but has lower performance distinguishing DALL-E 3 from other AI-generated images.

Microsoft AI Developments

In-House LLM Training: According to @bindureddy, Microsoft is training its own 500B parameter model called MAI-1, which may be previewed at the Build conference. As the model becomes available, it will be natural for Microsoft to push it instead of OpenAI's GPT line, making the two companies more competitive.
Copilot Workspace Impressions: @svpino had very positive first impressions of Copilot Workspace, noting its refined approach and tight integration with GitHub for generating code directly in repositories, solving issues, and testing. The tool is positioned as an aid to developers rather than a replacement.
Microsoft's AI Focus: @mustafasuleyman, having joined Microsoft, shared that the company is AI-first and driving massive technological transformation, with responsible AI as a cornerstone. Teams are working to define new norms and build products for positive impact.

Other LLM Developments

Anthropic's Approach: In an interview discussed by @labenz, Anthropic's CTO explained their approach of giving the AI many examples rather than fine-tuning for every task, as fine-tuning fundamentally narrows what the system can do.
DeepSeek-V2 Release: @deepseek_ai announced the release of DeepSeek-V2, an open-source 236B parameter MoE model that places top 3 in AlignBench, surpassing GPT-4, and ranks highly in MT-Bench, rivaling LLaMA3-70B. It specializes in math, code, and reasoning with a 128K context window.
Llama-3 Developments: @abacaj suggested Llama-3 with multimodal capabilities and long context could put pressure on OpenAI. @bindureddy noted Llama-3 on Groq allows efficiently making multiple serial calls for LLM apps to make multiple decisions before giving the right answer, which is difficult with GPT-4.

AI Benchmarks and Evaluations

LLMs as a Commodity: @bindureddy argued that LLMs have become a commodity, and even if GPT-5 is fantastic, other major labs and companies will catch up within months as language abilities plateau. He advises using LLM-agnostic services for the best performance and efficiency.
Evaluating LLM Outputs: @aleks_madry introduced ContextCite, a method for attributing LLM responses back to the given context to see how the model is using the information and if it's misinterpreting anything or hallucinating. It can be applied to any LLM at the cost of a few extra inference calls.
Emergent Abilities of LLMs: @raphaelmilliere shared a preprint exploring philosophical questions around LLMs, covering topics like emergent abilities, consciousness, and the status of LLMs as cognitive models. The paper dedicates a large portion to recent interpretability research and causal intervention methods.

Scaling Laws and Architectures 

Scaling Laws for MoEs: @teortaxesTex noted that DeepSeek-V2-236B took 1.4M H800-hours to train compared to Llama-3-8B's 1.3M H100-hours, validating the Scaling Laws for Fine-Grained MoEs paper. DeepSeek openly shares inference unit economics in contrast to some Western frontier companies.
Benefits of MoE Models: @teortaxesTex highlighted DeepSeek's architectural innovations in attention mechanisms (Multi-head Latent Attention for efficient inference) and sparse layers (DeepSeekMoE for training strong models economically), contrasting with the "scale is all you need" mindset of some other labs.
Mixture-of-Experts Efficiency: @teortaxesTex pointed out that at 1M context, a ~250B parameter MLA model like DeepSeek-V2 uses only 34.6GB for cache, suggesting that saving long-context examples as an alternative to fine-tuning is becoming more feasible.

AI Reddit Recap

Across r/LocalLlama, r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity. Comment crawling works now but has lots to improve!

AI Progress and Capabilities

Google's medical AI outperforms GPT and doctors: In /r/singularity, Google's Med-PaLM 2 AI destroys GPT's benchmark and outperforms doctors on medical diagnosis tasks. This highlights the rapid progress of AI in specialized domains like healthcare.
Microsoft developing large language model to compete: In /r/artificial, it's reported that Microsoft is working on a 500B parameter model called MAI-1 to compete with offerings from Google and OpenAI. The race to develop ever-larger foundational models continues.
AI system claims to eliminate "hallucinations": In /r/artificial, Alembic claims to have developed an AI that eliminates "hallucinations" and false information generation in outputs. If true, this could be a major step towards more reliable AI systems.

AI Ethics and Societal Impact

Viral AI generated misinformation: In /r/singularity, an AI generated photo of Katy Perry at the Met Gala went viral, gaining over 200k likes in under 2 hours. This demonstrates the potential for AI to rapidly spread misinformation at scale.
Prominent AI critic's credibility questioned: In /r/singularity, it's revealed that Gary Marcus, a prominent AI critic, admits he doesn't actually use the large language models he criticizes, drawing skepticism about his understanding of the technology.
Concerns over AI scams and fraud: In /r/artificial, Warren Buffett predicts AI scamming and fraud will be the next big "growth industry" as the technology advances, highlighting concerns over malicious uses of AI.

Technical Developments

New neural network architecture analyzed: In /r/MachineLearning, the Kolmogorov-Arnold Network is shown to be equivalent to a standard MLP with some modifications, providing new insights into neural network design.
Efficient large language model developed: In /r/MachineLearning, DeepSeek-V2, a 236B parameter Mixture-of-Experts model, achieves strong performance while reducing costs compared to dense models, advancing more efficient architectures.
New library for robotics and embodied AI: In /r/artificial, Hugging Face releases LeRobot, a library for deep learning robotics, to enable real-world AI applications and advance embodied AI research.

Stable Diffusion and Image Generation 

Stable Diffusion 3.0 shows major improvements: In /r/StableDiffusion, Stable Diffusion 3.0 demonstrates major improvements in image quality and prompt adherence compared to previous versions and competitors.
Efficient model matches Stable Diffusion 3.0 performance: In /r/StableDiffusion, the PixArt Sigma model shows excellent prompt adherence, on par with SD3.0 while being more efficient, providing a compelling alternative.
New model for realistic light painting effects: In /r/StableDiffusion, a new "Aether Light" LoRA model enables realistic light painting effects in Stable Diffusion, expanding creative possibilities for artists.

Humor and Memes

Humorous AI chatbot emerges: In /r/singularity, an "im-a-good-gpt2-chatbot" model appears on OpenAI's Playground and engages in humorous conversations with users, showcasing the lighter side of AI development.

AI Discord Recap

A summary of Summaries of Summaries

1. Model Performance Optimization and Benchmarking

[Quantization] techniques like AQLM and QuaRot aim to run large language models (LLMs) on individual GPUs while maintaining performance. Example: AQLM project with Llama-3-70b running on RTX3090.

Efforts to boost transformer efficiency through methods like Dynamic Memory Compression (DMC), potentially improving throughput by up to 370% on H100 GPUs. Example: DMC paper by @p_nawrot.

Discussions on optimizing CUDA operations like fusing element-wise operations, using Thrust library's transform for near-bandwidth-saturating performance. Example: Thrust documentation.

Comparisons of model performance across benchmarks like AlignBench and MT-Bench, with DeepSeek-V2 surpassing GPT-4 in some areas. Example: DeepSeek-V2 announcement.

2. Fine-tuning Challenges and Prompt Engineering Strategies

Difficulties in retaining fine-tuned data when converting Llama3 models to GGUF format, with a confirmed bug discussed.

Importance of prompt design and usage of correct templates, including end-of-text tokens, for influencing model performance during fine-tuning and evaluation. Example: Axolotl prompters.py.

Strategies for prompt engineering like splitting complex tasks into multiple prompts, investigating logit bias for more control. Example: OpenAI logit bias guide.

Teaching LLMs to use <RET> token for information retrieval when uncertain, improving performance on infrequent queries. Example: ArXiv paper.

3. Open-Source AI Developments and Collaborations

Launch of StoryDiffusion, an open-source alternative to Sora with MIT license, though weights not released yet. Example: GitHub repo. 

Release of OpenDevin, an open-source autonomous AI engineer based on Devin by Cognition, with webinar and growing interest on GitHub.

Calls for collaboration on open-source machine learning paper predicting IPO success, hosted at RicercaMente.

Community efforts around LlamaIndex integration, with issues faced in Supabase Vectorstore and package imports after updates. Example: llama-hub documentation.

4. Hardware Considerations for Efficient AI Workloads

Discussions on GPU power consumption, with insights on P40 GPUs idling at 10W but drawing 200W total, and strategies to limit to 140W for 85% performance. 

Evaluating PCI-E bandwidth requirements for inference tasks, often overestimated based on shared resources. Example: Reddit discussion.

Exploring single-threaded operations in frameworks like tinygrad, which doesn't use multi-threading for CPU ops like matrix multiplication.

Inquiries into Metal memory allocation on Apple Silicon GPUs for shared/global memory akin to CUDA's __shared__.

5. Misc

Exploring Capabilities and Limitations of AI Models: Engineers compared the performance of various models like Llama 3 70b, Mistral 8x22b, GPT-4 Turbo, and Sonar for tasks such as function calling, essay writing, and code refactoring (Nous Research AI). They also discussed the impact of quantization on model performance, like the degradation seen with llama.cpp's quantization (OpenAccess AI Collective).

Prompt Engineering Techniques and Challenges: The importance of prompt design was emphasized, noting how changes to templates and tokens can significantly impact model performance (OpenAccess AI Collective). Users shared tips for complex prompting tasks, like splitting product identification into multiple prompts (OpenAI), and discussed the difficulty of integrating negative examples (OpenAI). A new prompt generator tool from Anthropic was also explored (LLM Perf Enthusiasts AI).

Optimizing CUDA Operations and Model Training: CUDA developers shared insights on using Triton for efficient kernel design, leveraging fused operations for element-wise computations, and utilizing CUDA's Thrust library for optimal performance (CUDA MODE). Discussions also covered techniques to boost transformer efficiency, like Dynamic Memory Compression (DMC) (CUDA MODE), and fine-tuning CUDA kernels for specific architectures (CUDA MODE).

Advancements in Open-Source AI Projects: Notable open-source releases included DeepSeek-V2, an MoE model excelling in code and reasoning benchmarks (Latent Space), StoryDiffusion for magic story creation (OpenAccess AI Collective), and OpenDevin, an autonomous AI engineer (LlamaIndex). Collaborations were encouraged, like the open call to contribute to a machine learning IPO prediction paper (Unsloth AI).

PART 1: High level Discord summaries
Unsloth AI (Daniel Han) Discord

GGUF Conversion Conundrum: Engineers noted a bug in converting Llama3 models to GGUF which affects retention of fine-tuned data, spurring discussions on related GitHub Issue #7062 and a Reddit thread.
Training Tips and Troubles: Model training was a hot topic, including tokenization issues during fine-tuning and successful utilization of LORA Adapters with Unsloth. Additionally, concerns about base data impacting fine-tuned model results were addressed, suggesting fine-tuning alters weights for previously seen tokens.
AI Engine Struggles and Strategies: Aphrodite Engine's compatibility with 4bit bnb quantization raised questions, while a VRAM calculator for models like GGUF and exllama was referenced for inference program compatibility. Separately, a member highlighted a need for a generalised approach to fine-tuning Large Vision Language Models (LVLM).
Model Iterations and Collaborations: New versions of models were unveiled, including LLaMA-3-8B-RDF-Experiment for knowledge graph construction, while an open call was made for collaboration on a machine learning paper predicting IPO success.
Product Showcase and Support Requests: The introduction of Oncord, a professional website builder, was met with a demo at oncord.com, and members debated marketing tactics for startups. Additionally, support for moondream fine-tuning was requested, linking a GitHub notebook.

Nous Research AI Discord
Function Calling Face-off: Llama 3 70b shows better function calling performance over Mistral 8x22b, revealing a gap despite the latter's touted capabilities, exemplified by the members' discussion around the utility and accuracy of function calling in AI chatbots.
A Battle of Speeds in AI Training: Comparing training times leads to concerns, with reports of 500 seconds per step on an A100 for LoRA llama 3 8b tuning and just 3 minutes for 1,000 iterations for Llama2 7B using litgpt, showing wide variances in efficiency and raising questions on optimization and practices.
Impatience for Improvements: Users express disappointment over inaccessible features such as worldsim.nousresearch.com, and latency in critical updates for networks like Bittensor, highlighting real-time challenges faced by developers in AI and the ripple effects of stalled updates on productivity.
Quantization Leaps Forward: The AQLM project advances with models like Llama-3-70b and Command-R+, demonstrating progress with running Large Language Models on individual GPUs and touching upon the community's push for greater model accessibility and performance.
Chasing Trustworthy AI: Invetech's "Deterministic Quoting" to combat hallucinations indicates a strong community desire for reliable AI, particularly in sensitive sectors like healthcare, aiming to marry veracity with the innovative potential of Large Language Models as seen in the discussion.

Stability.ai (Stable Diffusion) Discord

Hyper Diffusion vs Stable Diffusion 3 Demystified: Engineers tackled the nuances between Hyper Stable Diffusion, known for its speed, and the upcoming Stable Diffusion 3. The community expressed concern over the latter potentially not being open-source, prompting discussions on the strategic safeguarding of AI models.

Bias Alert in Realistic Human Renders: The quest for the most effective realistic human model stimulated debate, with a consensus forming on the necessity of avoiding models with heavy biases like those from civitai to maintain diversity in generated outputs.

Dreambooth and LoRA Deep Dive: Deep technical consultation amongst users shed light on leveraging Dreambooth and LoRA when fine-tuning Stable Diffusion models. There was a particular focus on generating unique and varied faces and styles.

The Upscaling Showdown: Participants compared upscalers, such as RealESRGAN_x4plus and 4xUltrasharp, sharing their personal successes and preferences. The conversations aimed to identify superior upscaling techniques for enhanced image resolution.

Open-Source AI Twilight?: A recurrent theme in the dialogues reflected the community's anxiety about the future of open-source AI, particularly related to Stable Diffusion models. Talk revolved around the implications of proprietary developments and strategies for preserving access to crucial AI assets.

LM Studio Discord
Private Life for Your Code: Users call for a server logging off feature in LM Studio for privacy during development, with genuine concerns about server logs being collected through the GUI.
A Day in the CLI: There's interest in using LM Studio in headless mode and leveraging the lms CLI to start servers via the command line. Users also shared updates on tokenizer complications for Command R and Command R+ after a llama toolkit update and issued guidance for downloading updated quantizations from Hugging Face Co's Model Repo.
Memory Lapses in Linux: A peculiar case of Linux misreporting memory in LM Studio version 0.2.22 stirred some discussions, with suggestions offered to resolve GPU offloading troubles for running models like Meta Llama 3 instruct 7B.
Prompts Lost in Thought: Users tackled issues around LM Studio erroneously responding to deleted content and scoped document access, sparking a debate about LLMs' handling and retention of data.
Model Malfunctions: Troubles with several models in LM Studio were flagged, including llava-phi-3-mini misrecognizing images and models like Mixtral and Wizard LM fumbling Dungeon & Dragons data persistence despite AnythingLLM database use.
Power-play Considerations: Hardware aficionados in the guild grapple with GPU power consumption, server motherboards, and PCIe bandwidth, sharing successful runs of LM Studio in VMs with virtual GPUs and weighing in on practical hardware setups for AI endeavors.
Beta-testing Blues: Discussions mentioned crashes in 7B models on 8GB GPUs and unloading issues post-crash, with beta users seeking solutions for recurring errors.
SDK Advent: Announcement of new lmstudiojs SDK signals upcoming langchain integrations for more streamlined tool development.
In the AI Trenches: Users provided a solution for dependency package installation on Linux, discussed LM Studio's compatibility on Ubuntu 22.04 vs. 24.04, and shared challenges with LM Studio's API integration and concurrent request handling.
Engineer Inquiry: Curiosity peaked about GPT-Engineer setup with LM Studio and whether it involved custom prompting techniques.
Prompting the AIs: Some voiced the value of prompt engineering as a craft, citing it as central to garnering premium outputs from LLMs and sharing a win in Singapore’s GPT-4 Prompt Engineering Competition covered in Towards Data Science.
AutoGen Hiccups: There's a brief mention of a bug causing AutoGen Studio to send incomplete messages, with no further discussion on the resolution or cause.

HuggingFace Discord
ASR Fine-Tuning Takes Center Stage: Engineers discussed enhancing the openai/whisper-small ASR model, emphasizing dataset size and hyperparameter tuning. Tips included adjusting weight_decay and learning_rate to improve training, highlighted by community-shared resources on hyperparameters like gradient accumulation steps and learning rate adjustments.
Deep Dive into Quantum and AI Tools: Stealthy interest in seemingly nascent quantum virtual servers surfaced with Oqtant, while the AI toolkit included everything from an all-in-one assistant everything-ai capable of 50+ language support to the spaghetti-coded image-generating discord bot Sparky 2.
Debugging and Datasets: Chatbots designing PowerPoint slides, XLM-R getting a Flash Attention 2 upgrade, and multi-label image classification training woes took the stage, connecting community members across problems and sharing valuable insights. Meanwhile, the lost UA-DETRAC dataset incited a search for its much-needed annotations for traffic camera-based object detection.
Customization and Challenges in Model Training: From personalizing image models with Custom Diffusion—requiring minimal example images—to the struggles with fine-tuning Stable Diffusion 1.5 and BERT models, the community wrestled with and brainstormed solutions for various training hiccups. Device mismatches during multi-GPU and CPU offloading and the importance of optimization techniques for restricted resources were notable pain points.
Novel Approaches in Teaching Retrieval to LLMs: A newer technique encouraging LLMs to use a <RET> token for information retrieval to boost performance was discussed with reference to a recent paper, highlighting the importance of this method for elusive questions that evade the model's memory. This sits alongside observations on model billing methods via token counts, with practical insights shared on pricing strategies.

Perplexity AI Discord
Beta Bewilderment: Users experienced confusion with accessing Perplexity AI's beta version; one assumed clicking an icon would reveal a form, which didn't happen, and it was clarified that the beta is closed.
Performance Puzzles: Across different devices, Perplexity AI users reported technical issues such as unresponsive buttons and sluggish loading. Conversations revolved around limits of models like Claude 3 Opus and Sonar 32k, effecting work, with calls to check Perplexity's FAQ for details.
AI Model Melee: Comparisons of AI models' capabilities, including GPT-4 Turbo, Sonar, and Opus, were discussed, focusing on tasks like essay writing and code refactoring. Clarity was sought on whether source limits in searches had increased, with GIFs used to illustrate responses.
API Angst and Insights: Discussions in the Perplexity API channel ranged from crafting JSON outputs to perplexities with the search features of Perplexity's online models. The documentation was updated (as highlighted in a link to docs), important for users dealing with issues like outdated search results and exploring model parameter counts.
Shared Discoveries through Perplexity: The community delved into Perplexity AI's offerings, addressing an array of topics from US Air Force insights to Microsoft's 500 billion parameter AI model. Users shared an aspiration for a standardized image creation UI along with links to features like Insanity by XDream and emphasized content's shareability.

CUDA MODE Discord
GPU Clock Speed Mix-Up: A conversation was sparked by confusion over the clock speed of H100 GPUs, with the initial statement of 1.8 MHz corrected to 1.8 GHz. This highlighted the need to distinguish MHz from GHz and the importance of accurate specifications in discussions on GPU performance.
Tuning CUDA: From Kernels to Libraries: Members shared insights on optimizing CUDA operations, emphasizing the efficiency of Triton in kernel design, the advantage of fused operations in element-wise computations, and the use of CUDA's Thrust library. A CUDA best practice is to use Thrust's for_each and transform for near-bandwidth-saturating performance.
PyTorch Dynamics: Various issues and improvements in PyTorch were discussed, including troubleshooting dynamic shapes with PyTorch Compile using TORCH_LOGS="+dynamic" and how to work with torch.compile for the Triton backend. An issue reported on PyTorch's GitHub relates to combining Compile with DDP & dynamic shapes, captured in pytorch/pytorch #125641.
Transformer Performance Innovations: Conversations revolved around techniques to boost the efficiency of transformers, with the introduction of Dynamic Memory Compression (DMC) by a community member, potentially improving throughput by up to 370% on H100 GPUs. Members also discussed whether quantization was involved in this method, with reference to the paper on the technique.
CUDA Discussions Heat Up in llm.c: The llm.c channel was bustling with activity, addressing issues such as multi-GPU training hangs on the master branch and optimization opportunities using NVIDIA Nsight™ Systems. A notable contribution is HuggingFace's release of the FineWeb dataset for LLM performance, documented in PR #369, with potential kernel optimizations for performance gains discussed in PR #307.

OpenAI Discord

OpenAI Linguistically Defines Its Data Commandments: OpenAI's new document on data handling clarifies the organization's practices and ethical guidelines for processing the copious amounts of data in the AI industry.

AI's Rhythmic Revolution Might Be Here: The discussion centered around the evolution of AI in music, referencing a musician's jam session with AI as an example of significant advancements in AI’s ability to generate music that resonates with human listeners.

Perplexity and Cosine Similarity Stir Engineer's Minds: Engineers marveled at discovering the utility of Perplexity in AI text analysis and debated the optimal cosine similarity thresholds for text embeddings, highlighting the shift to a "new 0.45" standard from the "old 0.9".

Prompting Practices and Pitfalls in the Spotlight: Tips on prompt engineering emphasized the complexity of using negative examples and splitting tasks into multiple prompts, and pointed to the OpenAI logit bias guide for fine-tuning AI responses.

GPT's Vector Vault and Uniform Delivery Assurances: Insights into GPT's knowledge base mechanics and performance consistency were shared, dispelling the notion that varying user demand affects GPT-4 output or that inferior models may be deployed to manage user load.

Eleuther Discord

Questioning the Sacred P-Value: Discussions highlighted the arbitrary nature of the .05 p-value threshold in scientific research, pointing toward a movement to shift this standard to 0.005 to enhance reproducibility, as advocated in a Nature article.

Pushing Boundaries with Skip Connections: Adaptive skip connections are under investigation with some evidence that making weights negative can improve model performance; details of these experiments can be found on Weights & Biases. Queries related to the underlying mechanics of weight dynamics were responded to with a gated residual network paper and a code snippet.

Model Evaluation in a Logit-Locked World: The concealment of logits in API models like OpenAI's to prevent extraction of sensitive "signatures" has sparked conversations about alternatives for model evaluation, referencing the approach with 'generate_until' in YAML for Italian LLM comparisons, in light of recent findings (logit extraction work).

Encounter with Non-Fine-Tunable Learning: Introduction of SOPHON, a framework designed for non-fine-tunable learning to restrict task transferability, aims to mitigate ethical misuse of pre-trained models (SOPHON paper). Alongside this, there's an emerging discussion about QuaRot, a rotation-based quantization scheme that compresses LLM components to 4-bit while maintaining performance (QuaRot paper).

Scaling and Loss Curve Puzzles: A noteworthy model scaling experiment using a 607M parameter setup trained on the fineweb dataset unearthed unusual loss curves, initiating advice to try the experiment on other datasets for benchmarking.

Modular (Mojo 🔥) Discord

Exploring "Mojo" with Boundless Coding Adventures: Engineers discussed intricacies of programming in mojo, including installing on Intel Mac OS using Docker, Windows support through WSL2, and integration with Python ecosystems. Emphasis on design choices, such as the inclusion of structs and classes, sparked debate while compilation capabilities allowing native machine code like .exe remained a highlight.

Stay Updated with Modular's Latest Scoops: Two important updates from the Modular team surfaced on Twitter, hinting at unmentioned advancements or news, with the community directed to check out Tweet 1 and Tweet 2 for the full details.

MAX Engine Excellence and API Elegance on Display: MAX 24.3 debuted in a community livestream, showcasing its latest updates and introducing a new Extensibility API for Mojo. Eager learners and curious minds are directed to watch the explanatory video.

Tinkering with Tensors and Tactics in Mojo Development: From tensor indexing tips to SIMD complications for large arrays, AI engineers shared pointers and paradigms in the mojo domain. The discussions expanded to cover benchmarking functions, constructors in a classless setup, advanced complier tool needs, a proposal for where clauses, and the potential of compile-time metaprogramming in mojo.

Community Projects Propelling Mojo Forward: Updates within the community projects showcased advancements and requests for assistance, such as an efficient radix sort plus benchmark for mojo-sort, migration troubles with Lightbug to Mojo 24.3 detailed in a GitHub issue, and the porting of Minbpe to Mojo that outpaced Python versions at Minbpe.mojo. Meanwhile, the search for a Mojo GUI library continues.

Nightly Compilation Changes the Game: Engineers wrangled with Mojo's type handling, specifically with traits and variants, signaling limitations and workarounds like PythonObject and @staticmethods. A fresh nightly compiler release sparked conversation about automating release notifications and highlighted improvements to Reference usage, all framed by a playful comment about the updates stretching the capacity of a 2k monitor.

OpenRouter (Alex Atallah) Discord

Rollback on Model Usage Rates: Soliloquy L3 8B model's price dropped to $0.05/M tokens for 2023 - 2024, available on both private and logged endpoints as announced in OpenRouter's price update.

Seeking Beta Brainiacs for Rubik: Rubik's AI calls for beta testers, offering two months of premium access to models including Claude 3 Opus, GPT-4 Turbo, and Mistral Large with a promo code at rubiks.ai, also hinting at a tech news section featuring Apple and Microsoft's latest endeavors.

Decoding the Verbose Llama : Engineers shared frustrations over the length of responses from llama-3-lumimaid-8b, discussing complexities with verbosity compared to models like Yi and Wizard, and buzzed about the release of Meta-Llama-3-120B-Instruct, highlighted in a Hugging Face reveal.

Inter-Regional Model Request Mysteries: Users mulled over Amazon Bedrock potentially imposing regional restrictions on model requests, with the consensus tilting towards cross-region requests being plausible.

Precision Pointers and Parameter Puzzles: Conversations peeled back preferences on model precision within OpenRouter, generally sustaining fp16, and occasionally distilling to int8, dovetailing into discussions on whether the default parameters require tinkering for optimal conversational results.

OpenInterpreter Discord
Python 3.10 Spells Success: Open Interpreter (OI) should be run with Python 3.10 to avoid compatibility issues; one user improved performance by switching to models like dolphin or mixtral. The GitHub repository for Open Interpreter was suggested for insights on skill persistence.
Conda Environments Save the Day: Engineers recommended using a Conda environment for a conflict-free installation of Open Interpreter on Mac, specifically with Python 3.10 to sidestep version clashes and related errors.
Jan Framework Enjoys Local Support: Jan can be utilized as a local model framework for the O1 device without hiccups, contingent on similar model serving methods as with Open Interpreter.
Globetrotters Inquire About O1: The 01 device works globally, but hosted services are assumed to be US-centric for now, with no international shipments confirmed.
Fine-Tuning Frustrations and Fixes: A call to understand and employ system messages effectively before fine-tuning models led to the suggestion of OpenPipe.ai, as members navigate optimal performance for various models with Open Interpreter. The conversation included benchmarking models and the poor performance of Phi-3-Mini-128k-Instruct when used with OI.

OpenAccess AI Collective (axolotl) Discord
Open Source Magic on the Rise: The community launched an open-source alternative to Sora, named StoryDiffusion, released under an MIT license on Github; its weights, however, are still pending release.
Memory Efficiency Through Unsloth Checkpointing: Implementing unsloth gradient checkpointing has led to a reported reduction in VRAM usage from 19,712MB to 17,427MB, highlighting Unsloth's effectiveness in memory optimization.
Speculations on Lazy Model Layers: An oddity was observed where only specific slices of model layers were being trained, contrasting the full layer training seen in other models; theories posited include models potentially optimizing mainly the first and last layers when confronted with too easy datasets.
Prompt Design Proves Pivotal: AI enthusiasts emphasized that prompt design, particularly regarding the use of suitable templates and end-of-text tokens, is critical in influencing model performance during both fine-tuning and evaluation.
Expanded Axolotl Docs Unveil Weight Merging Insights: A new update to Axolotl documentation has been rolled out, enhancing insights on merging model weights, with an emphasis on extending these guidelines to cover inference strategies, as seen on the Continuum Training Platform.

LangChain AI Discord

LangChain's OData V4 and More: Discussions highlighted interest in LangChain's compatibility with Microsoft Graph (OData V4), and a need for API access to kappa-bot-langchain. There was also a query about the k parameter in ConversationEntityMemory, referencing the LangChain documentation.

Python vs. JS Streaming Consistency Issues: Members are experiencing inconsistencies with streamEvents in the JavaScript implementation of LangChain's RemoteRunnable, which works as expected in Python. This prompted suggestions to contact the LangChain GitHub repository for resolution.

AI Projects Seek Collaborators: An update was shared about everything-ai V1.0.0, now including a user-friendly local AI assistant with capabilities like text summarization and image generation. The request for beta testers for Rubiks.ai, a research assistant tool, was also discussed. Beta tester sign-up is available at Rubiks.ai.

No-Code Tool for Smooth AI Deployments: Introduction of a no-code tool aimed at easing the creation and deployment of AI apps with embedded prompt engineering features. The early demo can be watched here.

Learning Langchain Through Video Tutorials: Members have access to the "Learning Langchain Series" with the latest tutorials on API Chain and Router Chain available on YouTube and here, respectively. These guide users through the usage and benefits of these tools in managing APIs with large language models.

LAION Discord

Hungry for Realistic AI Chat? Look to Roleplays!: An idea was pitched to compile a dataset of purely human-written dialogue, which might include jokes and more authentic interactions, to enhance AI conversations that go beyond the formulaic replies seen in smart instruct models.

Create With Fake: Introducing Simian Synthetic Data: A Simian synthetic data generator was introduced, capable of generating images, videos, and 3D models for potential AI experimentation, offering a tool for those looking to simulate data for research purposes.

The Hunt for Perfect Datasets: In response to a request about datasets optimal for text/numeric regression or classification tasks, several suggestions were made, including MNIST-1D and the Stanford Large Movie Review Dataset.

Text-to-Video: Diffusion Beats Transformers: It was debated that diffusion models are currently the best option for state-of-the-art (SOTA) text-to-video tasks and are often more computationally efficient as they can be fine-tuned from text-to-image (T2I) models.

Video Diffusion Model Expert Weighs In: An author of a stable video diffusion paper discussed the challenges faced in ensuring quality text supervision for video models, and the benefits of captioning videos using large language models (LLMs), bringing up the differences between autoregressive and diffusion video generation techniques.

LlamaIndex Discord

Learn from OpenDevin's Creators: LlamaIndex invites engineers to a webinar featuring OpenDevin's authors on Thursday at 9am PT, to explore autonomous AI agent construction with insight from GitHub's growing embrace. Register for the webinar here.

Hugging Face and AIQCon Updates: Upgrades to Hugging Face's TGI toolkit now cater to function calling and batched inference; meanwhile, Jerry Liu gears up to discuss Advanced Question-Answering Agents at AIQCon, with discounts via "Community" code cited in a tweet.

Integrating LlamaIndex Just Got Trickier: Engineers reported challenges integrating LlamaIndex with Supabase Vectorstore and experienced package import confusion, quickly addressed by the updated llama-hub documentation.

Problem-Solving the LlamaIndex: Debating over deletion of document knowledge and local PDF parsing libraries, the community leaned towards re-instantiating the query engine and leveraging PyMuPDF for solutions, while considering prompt engineering to tackle irrelevant model responses.

Scouting & Reflecting on AI Agents: Engineers seek effective HyDE methods for language to SQL conversion while introspective agents draw focus with their reflection agent pattern, as observed in an article on AI Artistry, despite some hitting a 404 error.

tinygrad (George Hotz) Discord

LLVM IR Inspires tinygrad Formatting Proposal: A readability improvement for tinygrad was suggested, looking to adopt an operation representation closer to LLVM IR's human-readable format. The conversation pivoted to Static Single Assignment (SSA) form and potential confusion caused by the placement of the PHI operation in tinygrad.

Tinygrad Stays Single-threaded: George Hotz confirmed that tinygrad does not use multi-threading for CPU operations like matrix multiplication, maintaining its single-threaded design.

Remapping Tensors for Efficiency: Techniques involving remapping tensors by altering strides were discussed, with a focus on how to perform reshapes efficiently, akin to tinygrad's internal methods.

Push for Practical Understanding in tinygrad Community: Sharing of resources such as symbolic mean explanations on GitHub and a Google Doc on view merges indicated a drive for better understanding through practical examples and documentation in the tinygrad community.

tinygrad Explores Quantized Inference: Conversation touched on tinygrad's capabilities to perform quantized inference, a feature that can potentially compress models and accelerate inference times.

Cohere Discord
SQL Database Harbor Found: The SQL database needed for tracking conversational history in the Cohere toolkit is set to operate on port 5432, but a precise location was not mentioned.
Google Bard Rivalry, School Edition: A high school student planning to create a Bard-like chatbot received guidance from Cohere about adhering to user agreements with the caveat of obtaining a production key, as elaborated in Cohere's documentation.
Chroma Hiccups Amidst Local Testing: There's an unresolved IndexError when using Cohere toolkit's Chroma for document retrieval, with a full log trace available at Pastebin and a recommendation to use the latest prebuilt container.
Retriever Confusion in Cohere Toolkit: An anomaly was observed where Langchain retriever was selected by default despite an alternative being specified, as per a user report – though the screenshot provided to evidence this was not viewable.
Production Key Puzzle: A user faced an odd situation where a new production key behaved like a trial key in the Cohere toolkit. However, Cohere support clarified that it is expected behavior in Playground / Chat UI and correct functionality should prevail when used in the API.
Coral Melds Chatbot and ReRank Skills: Introducing Coral Chatbot, which merges capabilities like text generation, summarization, and ReRank into a unified tool available for feedback on its Streamlit page.
Python Decorators, a Quick Byte: A brief explainer titled "Python Decorators In 1 MINUTE" was shared for those seeking an expedited introduction to this pythonic concept - the video is accessible on YouTube.

Latent Space Discord

Centaur Coders Could Trim the Fat: The integration of AI in development is fostering a trend where Centaur Programmer teams might downsize, potentially leading to heightened precision and efficiency in production.

DeepSeek-V2 Climbs the Ranks: DeepSeek-V2 announced on Twitter as an open-source MoE model, boasts superior capabilities in code and logical reasoning, fueling discussions on its impact on current AI benchmarks.

Praising DeepSeek's Accomplishments: Correspondence featured praise for DeepSeek-V2's benchmark success, with an AI News newsletter detailing the model's fascinating enhancements to the AI ecosystem.

Scouting for Unified Search Synergy: The quest for effective unified search solutions prompts conversations about tools like Glean and a Hacker News discussion on potential open-source alternatives, suggesting a bot to bridge discordant search platforms.

Crowdsourcing AI Orchestration Wisdom: Curiosity arose around best practices for AI orchestration, with community members consulting on favored tools and techniques for managing complex pipelines involving text and embeddings.

AI Stack Devs (Yoko Li) Discord

Freeware Faraday Fundamentals: Engineers have confirmed that Faraday can be utilized locally without cost and does not necessitate a cloud subscription; a member's setup with 6 GB VRAM effectively runs the software along with its free voice output capability.
Enduring Downloads: It was emphasized that assets such as characters and models downloaded from the Faraday platform can be accessed and used indefinitely without any additional charges.
GPU Might Makes Right: A powerful GPU has been acknowledged as a viable alternative to a cloud subscription for running Faraday unless one prefers to support the developers through subscription.
Simulation Station Collaboration: In the realm of user-initiated projects, @abhavkedia has sparked a collaboration for creating a fun simulation aligning with the Kendrick and Drake situation, encouraging other members to join in.
New Playground for AI Enthusiasts: Engineers are invited to try out and potentially integrate Llama Farm with discussions centering around an integration technique that involves AI-Town, and a pivot towards making Llama Farm more universally applicable in systems utilizing the OpenAI API.

Mozilla AI Discord

Need for Speed on Device? Try Rocket-3B: Experiencing 8 seconds per token, participants sought faster model options, with Rocket-3B providing a notable speed improvement.

llamafile Caching Matures: Users can prevent redundant model downloads in llamafile by employing the ollama cache via -m model_name.gguf, enhancing efficiency.

Port Troubles with AutoGPT and llamafile: Integration issues between AutoGPT and llamafile surfaced; llamafile agent crashed during AP server starts, necessitating a manual workaround.

Seeking Feedback for AutoGPT-llamafile Integration: The AutoGPT community is actively developing integration with llamafile as indicated by a draft PR, calling for feedback before further work.

Interconnects (Nathan Lambert) Discord
AI Benchmarks in Spotlight: Dr. Jim Fan's tweet spurred a debate on the overvaluation of specific benchmarks and public democracy in AI evaluation, and the member suggested AB testing as a more effective approach.
Benchmarking Across Industries: Drawing parallels to the database sector, one engineer underscored the significance of having standard benchmarks for AI, referencing the approach mentioned in Dr. Fan's tweet.
TPC Standards Explained: In response to inquiries, a member clarified TPC as the Transaction Processing Council, which standardizes database industry benchmarks, referencing specific benchmarks such as TPC-C and TPC-H.
GPT-2's Surprising Comeback: A light-hearted mention by Sam Altman prompted discussion about GPT-2’s return to the LMsys arena, with a tweet snapshot shared showing the humor involved.
Lingering Doubts Over LMsys Direction: Nathan Lambert voiced skepticism towards OpenAI possibly using LMsys for model evaluations and expressed concern about LMsys's resource limitations and potential reputation damage from the latest 'chatgpt2-chatbot' hype.

DiscoResearch Discord

PR Hits the Chopping Block: A Pull Request was closed without additional context provided, signaling a potential change or halt in a discussed development effort.
AIDEV Excitement Builds Among Engineers: Attendees of the upcoming AIDEV event are syncing up and showing enthusiasm about meeting in person, but attendees are inquiring about whether they need to bring their own food.
Mistral Gains Ground in German Discussions: Utilization of the 8x22b Mistral model has been validated for a project, with a focus on deployment and performance. Inquiries into low-latency decoding techniques and the creation of a German dataset for inclusive language sparked dynamic discussions.
Critical Data Set Crafting for German AI: Suggestions for building a German-exclusive pretraining dataset from Common Crawl have been made, prompting a discussion about which domains to prioritize for inclusion due to their high-quality content.
Inclusive Language Resources Shared: For those interested in implementing inclusive language modes in models, resources like the INCLUSIFY prototype (https://davids.garden/gender/) and its GitLab repository (https://gitlab.com/davidpomerenke/gender-inclusive-german) have been circulated.

LLM Perf Enthusiasts AI Discord
Anthropic AI's Prompt Tool Piques Interest: Engineers found a new prompt generator tool in the Anthropic console, sparking discussions on its potential and capabilities.
Politeness through AI Crafted: The tool demonstrated its value by successfully rephrasing statements more courteously, marking a thumbs-up for practical AI usage.
Unpacking the AI's Instruction Set: An engineer embarked on uncovering the tool's system prompt, specifically noting the heavy reliance on k-shot examples in its architecture.
Extracting the Full AI Prompt Faces Challenges: Despite hurdles in retrieving the complete prompt due to its considerable size, the enthusiasm in the discussions remained high.
Share and Care Amongst AI Aficionados: A pledge was made by a community member to share the fully extracted prompt with peers, ensuring collective progress in understanding and utilizing the new tool.

Alignment Lab AI Discord
Given the information provided, there is no relevant discussion content to summarize for an AI Engineer audience. If future discussions include technical, detail-oriented content, a summary appropriate for engineers can be generated.

Datasette - LLM (@SimonW) Discord

GitHub Issue Sparks Plugin Collaboration: A discussion focused on improving a plugin included a link to a GitHub issue, indicating active development for a feature to implement parameterization in testing.
OpenAI Assistant API Compatibility Question: An inquiry was made about the possibility of using llm with the OpenAI Assistant API, expressing concern about missing previously shared information on the topic.

The Skunkworks AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

PART 2: Detailed by-Channel summaries and links
Unsloth AI (Daniel Han) ▷ #general (170 messages🔥🔥): 

Technical Glitch in GGUF Conversion: Users discussed a confirmed bug in fine-tuning Llama3 models when converting to GGUF, specifically mentioning an issue with retaining fine-tuned data during the conversion. Follow-up and related discussions were referenced with direct links to Reddit and GitHub issues (Reddit Discussion, GitHub Issue).

Running Aphrodite Engine with Quantization: A user encountered difficulties in running the Aphrodite engine with 4bit bnb quantization and sought advice. Recommendations were given to use fp16 with the --load-in-4bit flag and to build from the dev branch for better support and features.

LLM VRAM Requirements for Inference Programs: A link was shared to a VRAM calculator on Hugging Face's Spaces (LLM-Model-VRAM-Calculator) with discussion around its accuracy and compatibility with inference programs such as vLLM, GGUF, and exllama.

Unsloth Studio Release Delayed: A user inquired about the delay in the release of Unsloth Studio due to issues with phi and llama, looking forward to easier notebook usage. Another user clarified the correct usage of the eos_token in Unsloth’s updated training code for Llama3-8b-instruct.

Concerns About Model Base Data on Inference Results: The impact of the base model's training data on the results of a fine-tuned model was discussed. Clarification was provided that fine-tuning likely updates weights used for predicting tokens in conversations previously seen by the model.

Links mentioned:

LLM Model VRAM Calculator - a Hugging Face Space by NyxKrage: no description found
Reddit - Dive into anything: no description found
Llama3 GGUF conversion with merged LORA Adapter seems to lose training data randomly · Issue #7062 · ggerganov/llama.cpp: I'm running Unsloth to fine tune LORA the Instruct model on llama3-8b . 1: I merge the model with the LORA adapter into safetensors 2: Running inference in python both with the merged model direct...
Reddit - Dive into anything: no description found

Unsloth AI (Daniel Han) ▷ #random (86 messages🔥🔥): 

Supporter Role Confusion: A member was unsure about their supporter status after a message about a private channel for supporters. It was clarified that supporter roles are present but require membership or a donation of at least $10.
Tackling Unfiltered LLaMA-3 Outputs: A member expressed concern that LLaMA-3 provided uncensored outputs to questionable prompts. Despite attempts to stop it with system prompts, LLaMA-3 continued to produce explicit content.
FlashAttention Optimization Discussion: A member highlighted an article on Hugging Face Blog about optimizing attention computation using FlashAttention for long sequence machine learning models, which can reduce memory usage in training.
Graphics Card Sale Alert: A member shared a Reddit post about a discount on an MSi GeForce RTX 4090 SUPRIM LIQUID X 24 GB Graphics Card, prompting discussions on the advantages of smaller and more efficient cooling systems in newer GPU models.
AI-Generated Profile Picture Admiration: Discussion ensued about a member's new profile picture, which turned out to be AI-generated. It sparked interest and comparisons to characters from popular media.

Links mentioned:

Reddit - Dive into anything: no description found
Saving Memory Using Padding-Free Transformer Layers during Finetuning: no description found

Unsloth AI (Daniel Han) ▷ #help (412 messages🔥🔥🔥): 

GGUF Upload Queries: Members discussed the ability to upload GGUF models to GPT-4all, with confirmation from another member that it should be possible, and the use of Huggingface's model.push_to_hub_gguf to do so.
Tokenization Troubles: Conversations highlighted an issue with tokenization across various formats including GGUF, noting differences in responses when using Unsloth for fine-tuning compared to other inference methods.
Tokenizer Regex Revision: There's an ongoing discussion on GitHub Issue #7062 regarding tokenization problems with LLama3 GGUF conversion, especially relating to LORA adapters; a regex modification has been proposed to address this.
LORA Adapters and Training: A member successfully used LORA by training with load_in_4bit = False, saving LORA adapters separately, and converting them using a specific llama.cpp script, which resulted in perfect results for them.
Deployment and Multigpu Questions: Inquiries about deployment using local data for fine-tuning models and the ability to use multiple GPUs for training with Unsloth were discussed, with the current conclusion that Unsloth does not yet support multigpu but it may in the future.

Links mentioned:

Reddit - Dive into anything: no description found
Tweet from bartowski (@bartowski1182): After days of compute (since I had to start over) it's finally up! Llama 3 70B GGUF with tokenizer fix :)  https://huggingface.co/bartowski/Meta-Llama-3-70B-Instruct-GGUF  In other news, just orde...
Home: Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory - unslothai/unsloth
Cooking GIF - Cooking Cook - Discover & Share GIFs: Click to view the GIF
Supervised Fine-tuning Trainer: no description found
Reddit - Dive into anything: no description found
ScottMcNaught - Overview: ScottMcNaught has one repository available. Follow their code on GitHub.
GGUF breaks - llama-3 · Issue #430 · unslothai/unsloth: Findings from ggerganov/llama.cpp#7062 and Discord chats: Notebook for repro: https://colab.research.google.com/drive/1djwQGbEJtUEZo_OuqzN_JF6xSOUKhm4q?usp=sharing Unsloth + float16 + QLoRA = WORKS...
llama3-instruct models not stopping at stop token · Issue #3759 · ollama/ollama: What is the issue? I'm using llama3:70b through the OpenAI-compatible endpoint. When generating, I am getting outputs like this: Please provide the output of the above command. Let's proceed f...
Google Colab: no description found
Llama3 GGUF conversion with merged LORA Adapter seems to lose training data randomly · Issue #7062 · ggerganov/llama.cpp: I'm running Unsloth to fine tune LORA the Instruct model on llama3-8b . 1: I merge the model with the LORA adapter into safetensors 2: Running inference in python both with the merged model direct...
GitHub - ggerganov/llama.cpp at gg/bpe-preprocess: LLM inference in C/C++. Contribute to ggerganov/llama.cpp development by creating an account on GitHub.
Llama3 GGUF conversion with merged LORA Adapter seems to lose training data randomly · Issue #7062 · ggerganov/llama.cpp: I'm running Unsloth to fine tune LORA the Instruct model on llama3-8b . 1: I merge the model with the LORA adapter into safetensors 2: Running inference in python both with the merged model direct...
Llama3 GGUF conversion with merged LORA Adapter seems to lose training data randomly · Issue #7062 · ggerganov/llama.cpp: I'm running Unsloth to fine tune LORA the Instruct model on llama3-8b . 1: I merge the model with the LORA adapter into safetensors 2: Running inference in python both with the merged model direct...
Llama3 GGUF conversion with merged LORA Adapter seems to lose training data randomly · Issue #7062 · ggerganov/llama.cpp: I'm running Unsloth to fine tune LORA the Instruct model on llama3-8b . 1: I merge the model with the LORA adapter into safetensors 2: Running inference in python both with the merged model direct...
Llama3 GGUF conversion with merged LORA Adapter seems to lose training data randomly · Issue #7062 · ggerganov/llama.cpp: I'm running Unsloth to fine tune LORA the Instruct model on llama3-8b . 1: I merge the model with the LORA adapter into safetensors 2: Running inference in python both with the merged model direct...
llama : fix BPE pre-tokenization (#6920) · ggerganov/llama.cpp@f4ab2a4: * merged the changes from deepseeker models to main branch

 * Moved regex patterns to unicode.cpp and updated unicode.h

 * Moved header files

 * Resolved issues

 * added and refactored unic...
readme : add note that LLaMA 3 is not supported with convert.py (#7065) · ggerganov/llama.cpp@ca36326: no description found
microsoft/Phi-3-mini-128k-instruct · Hugging Face: no description found

Unsloth AI (Daniel Han) ▷ #showcase (23 messages🔥): 

LLaMA Variants for Enhanced Knowledge: A new LLaMA-3 variant has been developed for aiding knowledge graph construction, with a focus on structured data like RDF triples. The model, LLaMA-3-8B-RDF-Experiment, is designed to generate knowledge graph triples and specifically excludes non-English data sets.
Instruct Coder Model Released: A new LLaMA model, rombodawg/Llama-3-8B-Instruct-Coder-v2, has been finished and hosts improvements over its predecessor. The updated model Llama-3-8B-Instruct-Coder-v2 has been retrained to fix previous issues and is expected to perform better.
Oncord: Professional Website Builder Unveiled: Oncord has been presented as a professional website builder for creating modern websites with integrated tools for marketing, commerce, and customer management. The platform, showcased at oncord.com, offers a read-only demo and is aimed at a mix of technical and non-technical users.
Open Call for Collaboration on Machine Learning Paper: There's an invitation for the community to contribute to an open source paper predicting IPO success with machine learning. Interested parties can assist with the paper hosted at RicercaMente.
Startup Discussion and Networking: A dialogue took place regarding startup marketing, strategies, and collaborations. Specifically, one startup Oncord has been discussed, with a focus on enhancing technical flexibility for users, and another concept for measuring trust between viewers and content creators was hinted at but not officially launched yet.

Links mentioned:

PREDICT IPO USING MACHINE LEARNING: Open source project that aims to trace the history of data science through scientific research published over the years
rombodawg/Llama-3-8B-Instruct-Coder-v2 · Hugging Face: no description found
M-Chimiste/Llama-3-8B-RDF-Experiment · Hugging Face: no description found
The miR-200 family is increased in dysplastic lesions in ulcerative colitis patients - PubMed: UC-Dysplasia is linked to altered miRNA expression in the mucosa and elevated miR-200b-3p levels.
Oncord - Digital Marketing Software: Website, email marketing, and ecommerce in one intuitive software platform. Oncord hosted CMS makes it simple.
no title found: no description found

Unsloth AI (Daniel Han) ▷ #suggestions (3 messages): 

Fine-Tuning LVLM Request: A member expressed a desire for a generalised way of fine-tuning Large Vision Language Models (LVLM).

Call for Moondream Support: Another member requested support for moondream, noting that it currently only finetunes the phi 1.5 text model, and shared the GitHub notebook for moondream finetuning.

Link mentioned: moondream/notebooks/Finetuning.ipynb at main · vikhyat/moondream: tiny vision language model. Contribute to vikhyat/moondream development by creating an account on GitHub.

Nous Research AI ▷ #ctx-length-research (2 messages): 

In Quest for Data Collection Progress: A member inquired about the current count of pages collected in the cortex project, seeking an update on the data accumulation milestone.

Navigating the Void: A link was posted presumably related to the ctx-length-research channel, but the content or context of the link is inaccessible as it was referenced as <<>>.

Nous Research AI ▷ #off-topic (6 messages): 

Innovative Cooking Convenience Unveiled: A YouTube video titled "Recipic Demo" was shared, showcasing a website where users can upload their available ingredients to receive meal recipes. Intrigue is sparked for those seeking culinary inspiration with what they have on hand. Watch "Recipic Demo"

Delving into Enhancements for Multimodal Language Models: A member inquires about ways to significantly improve multimodal language models, mentioning the integration of JEPA as a potential enhancement, though a repository or model for such integration hasn't been found.

Multimodal Collaboration Envisioned: In response to enhancing multimodal language models, another member suggests the idea of tools that enable language models to utilize JEPA models, indicating an interest in cross-model functionality.

Push for Higher Resolution in Multimodal Language Models: Advancing multimodal models can involve increasing their resolution to better interpret small text in images, a member suggests. This advancement could widen the scope of visual data that language models can effectively understand and incorporate.

Link mentioned: Recipic Demo: Ever felt confused about what to make for dinner or lunch? What if there was a website where you could just upload what ingredients you have and get recipes ...

Nous Research AI ▷ #interesting-links (7 messages): 

AQLM Pushes the Envelope with Llama-3: The AQLM project introduces more prequantized models, such as Llama-3-70b and Command-R+, enhancing the accessibility of open-source Large Language Models (LLMs). In particular, Llama-3-70b can run on a single RTX3090, showcasing significant progress in model quantization.

Orthogonalization Techniques Create Kappa-3: Phi-3's weights have been orthogonalized to reduce model refusals, released as the Kappa-3 model. Kappa-3 comes with full precision (fp32 safetensors) and a GGUF fp16 option, although questions remain about its performance on prompts requiring rule compliance.

Deepseek AI Celebrates a Win: A share from Deepseek AI's Twitter points to their success, triggering a light-hearted joke about family resemblances in AI achievements.

Revolutionizing Healthcare with Deterministic Quoting: Invetech's project introduces "Deterministic Quoting" to address the risk of LLMs generating hallucinated quotations in sensitive fields like healthcare. With this technique, only verbatim quotes from the source material are displayed with a blue background, aiming to enhance trust in AI's use in medical record processing and diagnostics. Details and visual provided.

Links mentioned:

Hallucination-Free RAG: Making LLMs Safe for Healthcare: LLMs have the potential to revolutionise our field of healthcare, but the fear and reality of hallucinations prevent adoption in most applications.
Reddit - Dive into anything: no description found
Reddit - Dive into anything: no description found

Nous Research AI ▷ #general (527 messages🔥🔥🔥): 
<ul>
 <li>AI Chatbot Comparison and Speculation: Members discussed the performance of various AI models, with particular focus on function calling capabilities. **Llama 3 70b** was deemed superior to **Mistral 8x22b** for function calling, despite the latter's "superior function calling" marketing.</li>
 <li>The Return of GPT-2 in LMSYS: There's buzz around the return of **GPT-2** to LMSYS with significant improvements, and speculation on whether it's a new model being A/B tested or something else, such as GPT-4Lite or a more cost-efficient GPT alternative.</li>
 <li>Testing of the Hermes 2 Pro Llama 3 8B Model: A member requested testing of the **Hermes 2 Pro Llama 3 8B** model's function calling ability up to the 32k token limit, but practical limitations due to time and resource constraints were mentioned.</li>
 <li>Chatbot Names, Open Source Hopes, and GPT Hype Debates: The unique naming of chatbot models (like GPT-2 chatbot) led to discussions and jokes about their capabilities and the potential for an OpenAI model becoming open source. There were both skepticism and anticipation regarding the next big AI development and its release timeline.</li>
 <li>YAML vs. JSON in Model Input: A brief mention was made on the preference for YAML over JSON for model inputs due to better human readability and token efficiency.</li>
</ul>

Links mentioned:

cognitivecomputations/Meta-Llama-3-120B-Instruct-gguf · Hugging Face: no description found
OpenAI exec says today's ChatGPT will be 'laughably bad' in 12 months: OpenAI's COO said on a Milken Institute panel that AI will be able to do "complex work" and be a "great teammate" in a year.
Tweet from Maxime G, M.D (@maximegmd): Internistai 7b: Medical Language Model  Today we release the best 7b medical model, outperforming GPT-3.5 and achieving the first pass score on the USMLE! Our approach allows the model to retain the s...
Mlp Relevant GIF - MLP Relevant Mylittlepony - Discover & Share GIFs: Click to view the GIF
Support for grammar · Issue #1229 · vllm-project/vllm: It would be highly beneficial if the library could incorporate support for Grammar and GBNF files. https://github.com/ggerganov/llama.cpp/blob/master/grammars/README.md
TRI-ML/mamba-7b-rw · Hugging Face: no description found
Tweet from Kyle Mistele 🏴‍☠️ (@0xblacklight): btw I tested this with @vllm_project and it works to scale @NousResearch's Hermes 2 Pro Llama 3 8B to ~32k context with great coherence & performance (I had it summarizing @paulg essays)  Download...
Update Server's README with undocumented options for RoPE, YaRN, and KV cache quantization by K-Mistele · Pull Request #7013 · ggerganov/llama.cpp: I recently updated my LLama.cpp and found that there are a number of server CLI options which are not described in the README including for RoPE, YaRN, and KV cache quantization as well as flash at...
NousResearch/Hermes-2-Pro-Llama-3-8B · Hugging Face: no description found
NousResearch/Hermes-2-Pro-Llama-3-8B · Hugging Face: no description found
GitHub - huggingface/lerobot: 🤗 LeRobot: State-of-the-art Machine Learning for Real-World Robotics in Pytorch: 🤗 LeRobot: State-of-the-art Machine Learning for Real-World Robotics in Pytorch - huggingface/lerobot
Extending context size via RoPE scaling · ggerganov/llama.cpp · Discussion #1965: Intro This is a discussion about a recently proposed strategy of extending the context size of LLaMA models. The original idea is proposed here: https://kaiokendev.github.io/til#extending-context-t...
Port of self extension to server by Maximilian-Winter · Pull Request #5104 · ggerganov/llama.cpp: Hi, I ported the code for self extension over to the server. I have tested it with a information retrieval, I inserted information out of context into a ~6500 tokens long text and it worked, at lea...

Nous Research AI ▷ #ask-about-llms (12 messages🔥): 

LlamaCpp Update Resolves Issue: An issue with LlamaCpp not generating the <tool_call> token was resolved by updating to the latest version. The system prompt now works as intended.
LoRA Tuning Challenges on A100: A member is experiencing unexpectedly long training times with LoRA llama 3 8b, where each step takes approximately 500 seconds on an A100 using axolotl, prompting them to consider debugging due to others having much faster training times.
Comparative Training Speed Insights: For Llama2 7B, a member reported it took roughly 3 minutes for 1,000 iterations using litgpt, indicating a significant speed difference in training times compared to what another member experienced with LoRA.
Best Practices for Teaching GPT Examples: A member asked for advice on the best method to train GPT with examples, contemplating between providing a file with examples and structuring the examples as repeated user-assistant message pairs.
Attention Paper Implementation Feedback Request: A member sought feedback on their reimplementation of the "Attention is All You Need" paper, sharing their GitHub repository at https://github.com/davidgonmar/attention-is-all-you-need. They're considering improvements like using torch's scaled dot product and pretokenizing.

Nous Research AI ▷ #bittensor-finetune-subnet (11 messages🔥): 

New Miner Repo Stuck: A user reported their repository commitment to the mining pool was not downloading for hours, indicating potential network issues for newcomers.
Network Awaiting Critical PR: A user mentioned the Bittensor network is currently non-operational until a pending pull request (PR) is merged, which is crucial for fixing the network.
Timeframe for Network Fix Uncertain: When asked, a user stated that the PR would be merged "soon", but clarified they have no control over the PR review process, leaving an ambiguous timeline.
Network Issues Stall Model Validation: Clarification was given that new commits or models submitted to the network will not be validated until the aforementioned PR is resolved, directly impacting miner operations.
Seeking GraphQL Service Information: One user inquired about resources or services related to GraphQL for Bittensor subnets, indicating a possible need for developer support or documentation.

Nous Research AI ▷ #world-sim (7 messages): 

World Sim Access Issues Persist: A member expressed difficulties with accessing worldsim.nousresearch.com, noting that the site is still not operational with a simple "still not work" comment.
Expressing Disappointment: In response to the ongoing issue, there was another expression of disappointment, characterized by multiple frowning emoticons.
Call for Simulation: A brief message stating "plz sim" was posted, possibly indicating a desire to start or engage with a simulation.
Inquiry about World Sim: A member inquired, "What's world sim? Where can i find more info? and What's a world-sim role?" showing interest in the simulation aspect of the channel.
Guidance to Information: In response to questions about World Sim, a member directed others to a specific channel <#1236442921050308649> for a pinned post that likely contains the relevant information.

Link mentioned: worldsim: no description found

Stability.ai (Stable Diffusion) ▷ #general-chat (421 messages🔥🔥🔥): 

Dissecting Diffusion Models: Members exchanged insights on the difference between Hyper Stable Diffusion, a finetuned or LoRA-ed model that operates quickly, and Stable Diffusion 3, a distinct model not equivalent to Hyper Stable Diffusion. Links to explanatory resources were not provided.

Seeking Stable Diffusion Clarity: Conversations circled around Stable Diffusion no longer being open-source and the potential non-release of SD3. Users discussed the importance of downloading and saving models and adapters amid fears that AI's open-source era might be ending.

Optimizing Realistic Human Models:
  A discussion on finding the best realistic human model with flexibility covered various model options, with suggestions to avoid heavy bias in models like those from civitai to prevent sameness in generated people.

Dreambooth and LoRA Explorations: Users engaged in verbose consultation and detailed discussions about how to best use Dreambooth and LoRA training for Stable Diffusion, debating the best approach to creating unique faces and styles.

Adventures in Upscaling: Queries about the most effective upscaler led to discussions about various upscaling models and workflows such as RealESRGAN_x4plus and 4xUltrasharp, with users sharing personal experiences and approximation of preferences.

Links mentioned:

How to use Stable Diffusion - Stable Diffusion Art: Stable Diffusion AI is a latent diffusion model for generating AI images. The images can be photorealistic, like those captured by a camera, or in an artistic
How to Install Stable Diffusion - automatic1111: Part 2: How to Use Stable Diffusion https://youtu.be/nJlHJZo66UAAutomatic1111 https://github.com/AUTOMATIC1111/stable-diffusion-webuiInstall Python https://w...
Stable Cascade Examples: Examples of ComfyUI workflows
Stylus: Automatic Adapter Selection for Diffusion Models: no description found
LORA training EXPLAINED for beginners: LORA training guide/tutorial so you can understand how to use the important parameters on KohyaSS.Train in minutes with Dreamlook.AI:  https://dreamlook.ai/?...
THE OTHER LoRA TRAINING RENTRY: Stable Diffusion LoRA training science and notes By yours truly, The Other LoRA Rentry Guy. This is not a how to install guide, it is a guide about how to improve your results, describe what options d...

LM Studio ▷ #💬-general (107 messages🔥🔥): 

Server Logging Off Option Request: A user expressed discomfort with the inability to turn off server logging via the GUI in LM Studio, emphasizing a desire for increased privacy in their app development process.
Recognition of Prompt Engineering Value: The legitimacy of prompt engineering as a critical and valuable skill in the tech industry was acknowledged, with references indicating it as a lucrative career and a pivotal aspect in producing high-quality outputs from LLMs.
Headless Mode Operation for LM Studio: Users discussed the feasibility of operating LM Studio in a headless mode, where a user demonstrated interest in starting the server mode via command line rather than GUI, and others provided insights on using lms CLI as a potential solution.
Phi-3 vs. Llama 3 for Quality Outputs: A debate emerged over the effectiveness of the Phi-3 model compared to Llama 3, particularly concerning the task of summarizing content and generating FAQs, with users sharing settings and strategies to improve outcomes.
Troubleshooting Model Crashes and Configuration: Multiple users reported issues regarding model performance in LM Studio, with problems such as high RAM consumption despite sufficient VRAM, unexpected behavior after updates, and errors when loading models. Community members responded with suggestions such as checking drivers, adjusting model configs, and evaluating system specs.

Links mentioned:

Welcome | LM Studio: LM Studio is a desktop application for running local LLMs on your computer.
How I Won Singapore’s GPT-4 Prompt Engineering Competition: A deep dive into the strategies I learned for harnessing the power of Large Language Models (LLMs)
GitHub - lmstudio-ai/lms: LM Studio in your terminal: LM Studio in your terminal. Contribute to lmstudio-ai/lms development by creating an account on GitHub.

LM Studio ▷ #🤖-models-discussion-chat (21 messages🔥): 

Llama Toolkit Update Affects Command R+ Tokenizer: Changes in llamacpp upstream for llama3 broke Command R and Command R+'s tokenizer, with additional reports of incorrect quantization. Updated quants for Command R+ can be found at Hugging Face Co Model Repo, and a note that do not concatenate splits but rather use gguf-split for file merging, if necessary.

Problems Fine-tuning Hermes-2-Pro L3 Noted: Despite popularity, fine-tuning Hermes-2-Pro L3 still presents issues, with an opinion expressed that it's better than L3 8b but not as improved over its predecessor as hoped.

Hermes-2-Pro L3 in Action: Running the model with 8bit MLX showed impressive handling of incoherent input, with a quoted example testing the AI's response to disclosing potentially unethical information. A user queried about applying a "jailbreak" to remove content safeguards.

Difficulties with GGUF Format and Llama.cpp: A user learned that GGUF format is currently not working with llama.cpp due to the need for an update in the toolkit. Prompting the suggestion to try alternative models or wait for the toolkit update before using certain GGUF-based models.

Translations and Cultural Sensitivity in AI Models: Recommendations were made for ai models excelling in translation and creative writing, noting Llama3's multilingual capabilities and Command R's language support. For culturally sensitive responses, WestLake was recommended, and for specific translation tasks, the T5 model was suggested as an alternative, available through Hugging Face's T5 documentation.

Links mentioned:

dranger003/c4ai-command-r-plus-iMat.GGUF · Hugging Face: no description found
YorkieOH10/granite-34b-code-instruct-Q8_0-GGUF · Hugging Face: no description found
T5: no description found

LM Studio ▷ #🧠-feedback (11 messages🔥): 

Linux Memory Misreporting in LM Studio: A user reported that their Ubuntu machine displayed 33.07KB of free RAM in LM Studio, while actually having 20GB free. There was confirmation of using version 0.2.22 of the Linux AppImage from the LM website.
Ubuntu Version Could Affect Dependencies: Clarification was sought on whether the user was using the modern Ubuntu v22.04, as older versions might have issues with library dependencies.
Library Dependency Concerns on Older Ubuntu Releases: The discussion pointed towards the possibility of library dependencies not functioning correctly on older Ubuntu releases.
Disabling GPU Offloading Resolves Running Issue: Disabling GPU offloading in settings appeared to resolve an issue, allowing a user to run the Meta Llama 3 instruct 7B model.
Guidance for Accessing Linux Beta Channels: A user was directed to obtain access to the Linux Beta channels by signing up for the Linux beta role through the Channels & Roles option on LM Studio.

LM Studio ▷ #📝-prompts-discussion-chat (8 messages🔥): 

Scoped Access to Document Sections: A user inquired about a technique in LM Studio, or LLMs in general, that would allow giving the AI temporary access to a specific section of a document. It was clarified that LLMs only know what is included in the prompt or embedded in their weights.
AI Responding to Deleted Content: The same user reported instances where LM Studio seemed to mix contexts that had been deleted, raising the possibility of bugs affecting the AI's response.
Understanding AI's Response Mechanism: There is a discussion on whether language models can retain information that was believed to be deleted, but the consensus is that if a language model seems to remember deleted content, it could either be due to a bug or an illusion. 

LM Studio ▷ #⚙-configs-discussion (9 messages🔥): 

Seeking the Ultimate AI Dungeon Master: A member expressed frustration with models like Mixtral and Wizard LM failing to track complex game elements in Dungeons & Dragons, despite uploading extensive background information. They reported that models struggle with maintaining continuity, like character sheets and hit points in adventures, even with the help of the AnythingLLM database.

llava-phi-3-mini Model Confusion: A member reported issues with llava-phi-3-mini model, which instead of describing the uploaded image, would describe random images from Unsplash. Attempts with different Prompt Templates have not resolved the problem, which includes the template getting stuck in a loop.

Philosophical Troubleshooting Inquiry: In response to problems with the llava-phi-3-mini model, another member inquired about how it's determined that the model is describing content from Unsplash and what specific Prompt Template was failing.

Continued Model Image Recognition Struggles: Multiple members are facing similar issues with vision models that either describe Unsplash images instead of the uploaded ones or stop working after processing the first image. The problem persists across various models, including ollama, and seems to be due to a recent backend update.

Bunny Llama to the Rescue!: Amidst the issues with various models, a member found success in Bunny Llama 3 8B V, which worked for them without the issues present in other models.

Long-Term Memory Challenges in AI Role-Playing: A member suggested nous research hermes2 mitral 8x7b do the q8 version and ycros bagelmusterytour v2 8x7b, musing that the current databases might be inadequate for advanced role-playing. They recommended exploring lollms, which has shown promise in maintaining long-term memory and personality continuity.

LM Studio ▷ #🎛-hardware-discussion (25 messages🔥): 

GPU Power Consumption Discussions: A user observed their P40 GPUs idling at 10 watts but never dropping below 50 watts after use, with a total draw of 200 watts from the GPUs, even when LM Studio is only using one at a time. They shared their server setup details, which include two 1600-watt power supplies at 220vac and a setup inside a shop to mitigate noise, while they remote desktop from their office.

Planning GPU Power Budget for Inference: Another user discussed their plan to limit their GPU to 140 watts for 85% performance, intending to use it on models such as 7b Mistral with high context and a small vision model, and asked if LM Studio effectively utilizes multiple GPUs.

Assessing Gaming Mobo for P40s Without Additional GPUs: One user pondered about using gaming motherboards for their P40s since server motherboards would provide full PCIe x16 bandwidth for each GPU, whereas gaming motherboards might compromise bandwidth when running multiple GPUs.

Debunking PCI-E Bandwidth Myths for Inference: In response to concerns about having sufficient PCI-E bandwidth for inference, a user provided Reddit links (Inference Reliance on PCI-E Bandwidth, Multi-GPU on exl2 or llamacpp) and a GitHub discussion (Perf test on various HW), suggesting that PCI-E bandwidth requirements are often overestimated for inference tasks.

Considering Hardware Configs for Efficient LLM Inferencing: Users exchanged ideas about efficient server builds, power consumption, thermals, and the balance between robust hardware and practicality, discussing if a single-purpose server is justified for running language models, and shared their practices, like not running servers 24/7 to save on energy.

Links mentioned:

MSN: no description found
Reddit - Dive into anything: no description found
Reddit - Dive into anything: no description found
Perf test on various HW · turboderp/exllama · Discussion #16: First of all I would like to thank you for your work, really like your inference implementation, seems to be the fastest so far for nvidia gpus! I ran a bunch of tests on various GPUs and wanted to...
Perf test on various HW · turboderp/exllama · Discussion #16: First of all I would like to thank you for your work, really like your inference implementation, seems to be the fastest so far for nvidia gpus! I ran a bunch of tests on various GPUs and wanted to...

LM Studio ▷ #🧪-beta-releases-chat (28 messages🔥): 

Elusive Error with Vision Model: A user reported an error when using a vision model, but the error message didn't include an error code or description, only system specifications. Other members (like propheticus_05547 and heyitsyorkie) engaged to clarify the problem, suggesting possible GPU limitations and asking about other models' functionality.

In Search of the Newest LLMs on LM Studio: User yu_xie inquired about refreshing LM Studio content to get the latest LLM models. heyitsyorkie explained that the home page of LM Studio is static and not a live repository, suggesting instead to use the model explorer, search "GGUF," and sort by Most Recent as a workaround.

Downloadable LLM Models: A user requested the ability to download the latest LLM models directly from the LM Studio App home page. heyitsyorkie responded that it's not currently possible due to models not always being released in GGUF format and potential updates required for llama.cpp when new models like IBM Granite are released.

Isolated Incident or Pervasive Issue?: A user, aibyhumans, mentioned crashes when using a 7B model on an 8GB GPU and that the crashes occur only with visual models. propheticus_05547 responded by questioning whether non-vision models work and suggesting to turn off GPU offloading or adjust it to 50/50.

Model Does Not Stick the Landing: The same user, aibyhumans, observed that the model seems not to fully unload after a crash and that after one successful load, subsequent attempts result in errors, necessitating a restart of the model each time. This behavior was noted to occur with different models as well.

LM Studio ▷ #autogen (2 messages): 

Troubleshooting AutoGen Studio: A member mentioned a bug in AutoGen Studio that seems to send truncated messages with only one or two words. They requested to see the code used for calling the bot and suggested printing out the message list before sending it.

LM Studio ▷ #langchain (2 messages): 

LM Studio SDK Development: A member mentioned the availability of the new lmstudiojs SDK, implying future langchain integrations are on the horizon. 

LM Studio ▷ #crew-ai (1 messages): 

Inquiry on GPT-Engineer Setup: A member expressed interest in understanding how GPT-Engineer was set up with LM Studio, asking specifically whether custom prompting was involved in the process.

LM Studio ▷ #🛠-dev-chat (41 messages🔥): 

Dependency Package Install Success: The command sudo apt install xvfb fuse libatk1.0-0 libatk-bridge2.0-0 libcups2 libgtk-3-0 libgbm1 libasound2 resolved prior issues, allowing the user to move past Step 2 in their installation process.

Compatibility Issues with LM Studio on Ubuntu 22.04: Users discussed compatibility issues while running LM Studio on Ubuntu 22.04 versus the latest Ubuntu Server 24.04, suggesting a possible dependency on newer operating systems.

LM Studio Needs GPU?: A member trying to run LM Studio encountered errors implying GPU issues, despite using a server that only provides a virtual or motherboard-based GPU.

LM Studio on Virtual GPU: Another user confirmed they have successfully run LM Studio inside a Virtual Machine using a virtual GPU, although they have not tested environments without a GPU.

LM Studio API Integration: A member shared excitement about integrating LM Studio support via the API into their own UI, while another faced challenges with LM Studio's concurrent request handling, especially when generating embeddings without using the provided SDK.

HuggingFace ▷ #general (164 messages🔥🔥): 

Whispering Fresh: Fine-Tuning ASR: Users discussed fine-tuning the openai/whisper-small ASR model, highlighting the importance of a sufficient dataset size and considering hyperparameter adjustments like weight_decay and learning_rate for better performance. Specific hyperparameters shared included those for gradient accumulation steps, learning rate, evaluation strategy, and more, while addressing differences between training and validation loss.

Accessing Llama on Android: One participant inquired about using llama.cpp on Android with Termux and accessing it through a webpage, indicating ongoing exploration of Llama's applications on mobile platforms.

Puzzling Performance Paradox: Inquiries were made about Smaug-72B-LLM outperforming giants like Mistral 8x22B and Llaam-3 70B, yet lacking the same level of fame, sparking a debate over the reasons behind its quiet presence despite strong benchmark results.

Help Wanted: Integrating AI with Blockchain: A user expressed interest in integrating AI with blockchain, seeking to meld together these cutting-edge technologies, demonstrating the diverse interests and initiatives present within the community.

Resourceful Sharing and Collaboration: The conversations were highlighted by members sharing youtube tutorials on fine-tuning Whisper, links to GitHub projects like BabyTorch for educational purposes, and URLs to helpful HF-related resources for working with LLMs and audio data. Community support and resource sharing were prevalent throughout the interactions.

Links mentioned:

GitHub - Mozilla-Ocho/llamafile: Distribute and run LLMs with a single file.: Distribute and run LLMs with a single file. Contribute to Mozilla-Ocho/llamafile development by creating an account on GitHub.
timm/ViT-SO400M-14-SigLIP-384 · Hugging Face: no description found
Hugging Face – Posts: no description found
Fine-tuning the ASR model - Hugging Face Audio Course: no description found
GitHub - getumbrel/llama-gpt: A self-hosted, offline, ChatGPT-like chatbot. Powered by Llama 2. 100% private, with no data leaving your device. New: Code Llama support!: A self-hosted, offline, ChatGPT-like chatbot. Powered by Llama 2. 100% private, with no data leaving your device. New: Code Llama support! - getumbrel/llama-gpt
GitHub - amjadmajid/BabyTorch: BabyTorch is a minimalist deep-learning framework with a similar API to PyTorch. This minimalist design encourages learners explore and understand the underlying algorithms and mechanics of deep learning processes. It is design such that when learners are ready to switch to PyTorch they only need to remove the word `baby`.: BabyTorch is a minimalist deep-learning framework with a similar API to PyTorch. This minimalist design encourages learners explore and understand the underlying algorithms and mechanics of deep le...
OpenAI Whisper - Fine tune to Lithuanian | step-by-step with Python: Fine-tune OpenAI's Whisper to different language is simple using Python and Google Colab with GPU. In this tutorial, I selected the small version of Whisper ...
A Multi-Agent game where LLMs must trick each other as humans until one gets caught: Five top LLMs - OpenAI's ChatGPT, Google Gemini, Anthropic's Claude, Meta's LLAMA 2, and Mistral AI's Mixtral 8x7B compete in this text-based Turing Test gam...

HuggingFace ▷ #today-im-learning (6 messages): 

Look-alike Machine Learning Modelling Unpacked: Tolulade shares an educational post on look-alike modeling for beginners. The article blends informative content with engineering insights, also promoting a networking platform called Semis for AI and Big Tech.

Struggling with Step-by-Step LLMs: A member tried to implement a "think step by step" process for their local Large Language Models (LLMs) but found that the models could not adapt from their fine-tuned, regular full answers.

Innovating Model Response Chain: The same member encountered better success by creating a language chain sequence of planner, writer, analyst, and editor working in a loop, using Llama 3 instruct 7B, which provided more comprehensive results than zero-shot outputs.

Promoting Eco-friendly AI: The importance of environmentally-conscious AI development was highlighted through a YouTube video discussing Meta AI's open reporting of CO2 emissions and a related tool called codecarbon, which aims to estimate the carbon footprint of ML projects.

Learning the Ropes of Quantization: A member shares their learning journey into quantization, both symmetric and asymmetric, which is an essential technique for optimizing machine learning models.

Links mentioned:

Introduction to Look-alike-Machine Learning Modelling: In Transactional & Marketing Email
What Makes Environmentally Conscious genAI? An Overview: A quick look at what it means to be environmentally conscious, as seen from @meta AI's example set with their open reporting of CO2 emissions from the LLaMA3...
Machine Learning CO2 Impact Calculator: Machine Learning has in impact on our climate. Here's how to estimate your GPU's carbon emissions

HuggingFace ▷ #cool-finds (12 messages🔥): 

Unveiling Quantum Virtual Servers: A member shared a link to an intriguing resource, Oqtant, which appears to be a platform related to quantum virtual servers. 
Revolutionizing RAG Pipeline Evaluation: Efforts to evaluate Retrieval Augmented Generation pipelines are now boosted with the introduction of Ragas Framework, designed for assessing the performance of RAG applications and developing metrics-driven strategies.
Introspective Agents under the Spotlight: A blog post on Medium discusses the potential of introspective AI agents within the LlamaIndex framework, aiming to improve AI’s self-evaluation and refinement capabilities.
Lilian Weng's AI Safety Blog: Lilian Weng blogs about her notes on AI learning and her work with the AI safety and alignment team at OpenAI, with a hint of humor about ChatGPT's help (or lack thereof) in her writing.
Innovating Image-Based Virtual Try-On: The IDM-VTON model is presented in a new paper, aiming to enhance naturalness and garment identity preservation in virtual try-on applications using a novel diffusion approach.

Links mentioned:

Lil'Log: Document my learning notes.
Improving Diffusion Models for Virtual Try-on: This paper considers image-based virtual try-on, which renders an image of a person wearing a curated garment, given a pair of images depicting the person and the garment, respectively. Previous works...
Oqtant: no description found
Introspective Agents with LlamaIndex: Ankush k Singal
Agent AI: Surveying the Horizons of Multimodal Interaction: Multi-modal AI systems will likely become a ubiquitous presence in our everyday lives. A promising approach to making these systems more interactive is to embody them as agents within physical and vir...
Introduction | Ragas: no description found
no title found: no description found

HuggingFace ▷ #i-made-this (8 messages🔥): 

Meet everything-ai: Your New AI Multi-Tasking Assistant: everything-ai is an all-in-one local AI-powered assistant that can interact with PDFs, text, images, and more in over 50 languages. The project's GitHub page includes a quick-start guide and a new user interface, distributed as a Docker application.

Spaghetti Code Achieves Functionality: Sparky 2 is a discord bot with image generation capabilities built on llama-cpp, described as a "spaghetti of python code" by its creator. The bot’s code is available on GitHub.

AI-Assisted Research Revolution: Adityam Ghosh introduces EurekAI, a new tool aimed to streamline the research process. Those interested in contributing feedback can engage via user interviews by contacting the team on their website, eurekai.tech.

Seeking Beta Testers for Advanced Research Assistant and Search Engine: Rubik's AI is looking for beta testers to try their premium search engine which includes access to state-of-the-art models like GPT-4 Turbo and Mistral Large. Interested parties can sign up and receive two months free premium using the promo code RUBIX at rubiks.ai.

AI Music Generation Goes 'nanners': A new death metal-dubstep track created using AI music generators and samples from Udio AI is shared for feedback. The track, "DJ Stomp - The Arsonist," can be heard on YouTube.

Real-Time Video Generation Demonstrated on Twitter: A real-time AI-generated video at 17fps was shared, demonstrating the prompt-based control of the visual output, although without recorded audio. The post can be viewed on Twitter.

Links mentioned:

OpenAlgos: no description found
GitHub - Gapi505/Sparky-2: Contribute to Gapi505/Sparky-2 development by creating an account on GitHub.
everything-ai: Introducing everything-ai, your fully proficient, AI-powered and local chatbot assistant! 🤖
GitHub - AstraBert/everything-ai: Introducing everything-ai, your multi-task, AI-powered and local assistant! 🤖: Introducing everything-ai, your multi-task, AI-powered and local assistant! 🤖 - AstraBert/everything-ai
DJ Stomp - The Arsonist | Deathstep: The Arsonist is a bit darker and more intense than any of my previous work, merging aggressive dubstep with death metal influences. Simply put, this track is...
Rubik's AI - AI research assistant & Search Engine: no description found

HuggingFace ▷ #reading-group (3 messages): 

Special Tokens Trigger Smarter Retrievals: The discussion introduced the concept of teaching Large Language Models (LLMs) to use a special token <RET> to trigger information retrieval when uncertain. The paper discussed explores the use of this technique to improve LLM performance, especially vital for less frequent questions that the LLM's parametric memory can't handle.

When LLMs Should Look Up Info: A new paper, promoted through a tweet by @omarsar0, details a fine-tuning method that enables LLMs to decide when to retrieve extra context. This approach can lead to more accurate and dependable Retrieve-And-Generate (RAG) systems.

Links mentioned:

When to Retrieve: Teaching LLMs to Utilize Information Retrieval Effectively: In this paper, we demonstrate how Large Language Models (LLMs) can effectively learn to use an off-the-shelf information retrieval (IR) system specifically when additional context is required to answe...
Bytez: When to Retrieve: Teaching LLMs to Utilize Information Retrieval Effectively: In this paper, we demonstrate how Large Language Models (LLMs) can effectively learn to use an off-the-shelf information retrieval (IR) system specifically when additional context is required to answe...
Tweet from elvis (@omarsar0): When to Retrieve?  This new paper presents an approach to train LLMs to effectively utilize information retrieval.  It first proposes a training approach to teach an LLM to generate a special token, &...
Training an LLM to effectively use information retrieval: This new paper presents an approach to train LLMs to effectively utilize information retrieval.It first proposes a training approach to teach an LLM to gener...

HuggingFace ▷ #computer-vision (15 messages🔥): 

Tinkering with Darknet Yolov4: A member shared their intention to experiment with Darknet Yolov4 by attaching a CNN to its tail for processing rescaled images and retraining the entire network, despite the model's age and the challenges in finding documentation for it.

Searching for the lost UA-DETRAC dataset: The UA-DETRAC dataset, useful for computer vision and object detection from traffic cameras, has disappeared online and its annotations are no longer available through the website or Internet Archive. A community member from BSC facilities has requested assistance from anyone who might have previously downloaded it.

Freezing Convnext for Efficient Training: A member inquired about using AutoModelForImageClassification.from_pretrained() with convnext tiny and whether it defaults to freezing the pretrained portion of the model. Another member provided advice, recommending explicitly setting the requires_grad attribute to False for the convolutional base parameters.

Training on Multi-label Image Classification: A discussion on resources for training image classification models on multi-label data took place, with a member seeking advice on managing datasets where images may have multiple color labels ranging from 1 to 10 possible colors. A resource from Hugging Face's computer vision course was found but reported to not work.

Facenet vs VGG16 for Face Recognition and Keypoints Detection: One member expressed the need for guidance on applying transfer learning to a pretrained facenet model for face recognition, while another was interested in finding models suitable for fine-tuning for cephalometric keypoint detection.

Link mentioned: Transfer Learning and Fine-tuning Vision Transformers for Image Classification - Hugging Face Community Computer Vision Course: no description found

HuggingFace ▷ #NLP (9 messages🔥): 

Flash Attention 2 in XLM-R Inquiry: A member expressed interest in adding Flash Attention 2 to XLM-R and is seeking guidance since it's not implemented in HuggingFace yet. There was a query about available tutorials or guidelines for implementation.
Request for Chatbot-Powered PowerPoint Generation: A request was made for a chatbot capable of generating PowerPoint presentations using OpenAI Assistant API, learning from previous ones, and modifying only the slide content. Alternatives with RAG or LLM models were also asked for.
Adding Models to Transformers: Discussion on the process for deciding which new models to integrate into HuggingFace Transformers. Community contributions are encouraged, with a suggestion to consider Papers with Code and other trending SOTA models.
Random Findings on Model Behavior: A member shared that classifiers created by Moritz are efficient, and highlighted issues related to probability distributions and model choices, requesting more information for troubleshooting the issue.
Debugging Script Issues Across Different Cloud Clusters: The community discussed the challenges of running scripts on different cloud environments and debugging peculiar errors such as the None type error in the sentence transformers' encode function. Debugging is emphasized as a valuable learning tool and crucial for resolving code issues.

HuggingFace ▷ #diffusion-discussions (7 messages): 

Customize Image Models with Ease: Custom Diffusion is a personalization technique for image generation models, requiring as few as 4-5 example images to learn concepts via cross-attention layers. For resource optimization, enabling xFormers with --enable_xformers_memory_efficient_attention is recommended for systems with limited vRAM, while the --set_grads_to_none flag can further reduce memory usage during training.

Conquering Device Mismatch in Model Offloading: A user encountered errors when attempting to combine Accelerate's multi-GPU functionality with Diffusers' model CPU offloading, receiving "expect tensors to be on the same device" and "cannot copy from meta device" error messages.

Estimating AI Model Costs with Token Counting: A user discusses billing considerations based on token counts using a pricing calculator guide, highlighting that token-based billing is a standard practice, with 1,000 tokens approximately equating to 750 words in English.

BERT Model Training Distress Signal: A user seeks help with BERT pretraining and fine-tuning, noting that while pretraining loss decreases normally, fine-tuning for sentiment analysis leads to overfitting within two epochs. They shared their Colab notebook for community input.

Stable Diffusion Finetuning Frustrations: A member asked for tips on fine-tuning Stable Diffusion 1.5 with an intimate dataset of about 1300 examples, indicating a struggle to find effective hyperparameters.

Links mentioned:

OpenAI & other LLM API Pricing Calculator - DocsBot AI: Calculate and compare the cost of using OpenAI, Azure, Anthropic, Llama 3, Google Gemini, Mistral, and Cohere APIs with our powerful FREE pricing calculator.
Custom Diffusion: no description found
Google Colab: no description found
Custom Diffusion: no description found

Perplexity AI ▷ #general (168 messages🔥🔥): 

Confusion Over Beta Access: A member clicked an icon expecting a form to appear, but it did not. Another member clarified that the beta is closed.
Technical Difficulties on Perplexity: Multiple users reported issues with Perplexity's responsiveness, with problems logging in and out, unresponsive buttons, and slow loading times across various devices.
Questions on Model Limits and Types: Several members inquired about daily limits on different models like Claude 3 Opus and Sonar 32k, discussing how these restrictions might affect their work, with references to official statements and updates on Perplexity's FAQ page.
Comparison Queries Between AI Models: Users compared the capabilities and limitations of different models such as GPT-4 Turbo, Sonar, and Opus for various tasks, including essay writing, code refactoring, and learning from novel writing styles.
Seeking Clarification on Source Limits: There was confusion around the source limit in searches, with members debating whether there had been an increase and sharing links to potentially relevant GIFs as responses.

Links mentioned:

Discord - A New Way to Chat with Friends & Communities: Discord is the easiest way to communicate over voice, video, and text.  Chat, hang out, and stay close with your friends and communities.
China hacked Ministry of Defence, Sky News learns: MPs will be told of a massive data breach involving the Ministry of Defence later today, targeting service personnel. 
TikTok - Make Your Day: no description found
Thistest GIF - Thistest Test - Discover & Share GIFs: Click to view the GIF
Scooby Doo Mystery Machine GIF - Scooby Doo Mystery Machine Cartoon - Discover & Share GIFs: Click to view the GIF
Skeleton Meme GIF - Skeleton meme - Discover & Share GIFs: Click to view the GIF

Perplexity AI ▷ #sharing (19 messages🔥): 

Showcasing Perplexity's Collection: Various users shared links to Perplexity AI exploring topics like the US Air Force, AlphaGo's reasoning, the game of Go, tape-to-tape, image creation, Boeing, Microsoft's 500 billion parameter model, noise-canceling headphones, and more.
Highlight on Intuitive Image Creation: A user shared a Perplexity AI link about creating an image and expressed a wish for a standard image creation UI.
Exploration of Tech and Trends: Some users are investigating Microsoft's new model, noise-canceling headphones, and other diverse topics through Perplexity AI's search function.
Emphasis on Shareable Content: Twice, Perplexity AI reminded users to ensure their threads are Shareable, emphasizing the importance of shareable content within the community.
XDream Features: Links were shared to an XDream page, focusing on an interface project and a feature titled Insanity by XDream.

Perplexity AI ▷ #pplx-api (23 messages🔥): 

JSON Outputs Curation Ideas: Members discussed workarounds for producing JSON formatted outputs, with one suggesting the use of simple config style in llama3 by giving examples with Explanation=Blah Blah Blah and Rating=Blah Blah Blah.
Perplexity API Search Capabilities Scrutinized: Users expressed difficulties with Perplexity's online models, particularly when trying to obtain up-to-date URLs for competitor landing pages, stating that results are often outdated or irrelevant.
Model Cards Guidance Updated: An update to the Perplexity documentation on model cards was pointed out, specifying that system prompts do not affect retrieval process in online models and providing a link to documentation for further details.
Debating Model Parameter Counts: There is confusion and discussion around the parameter count for the llama-3-sonar-large models, and some debate if it actually uses Llama as a base due to its reported MoE-like structure but not being Llama.
Sonar Model Site Limitations: Users inquired about the possibility of limiting Perplexity's sonar model outputs to specific sites, with attempts such as using site:scholar.google.com having inconsistent results.

Links mentioned:

Supported Models: no description found
pplx-api: no description found

CUDA MODE ▷ #triton (13 messages🔥): 

Implementing DoRALinear Modules: A member highlighted the implementation of BNBDoRALinear and HQQDoRALinear modules with torch-only and fused forward passes, adapted from FSDP-DoRA layers. The code, which needs to be robust for training, can be found in the Usage and Benchmarks section of their PR.
Triton's Advantage in Kernel Design: A member expressed the efficiency of Triton in designing kernels swiftly compared to high-level libraries like Cutlass, which would take more time. They also made minor tweaks to the autotuner for better debugging.
Bottlenecks and Fused GEMMs in DoRA Layer: Detailed profiling of the DoRA layer was shared, pinning the most costly kernels to the base layer matmul and the combined lora_B / lora_A matmul + 2-norm operations. Two custom fused GEMMs were designed to optimize these bottlenecks, present in the Profiling section of their PR.
Autotuner Code Divergence Concerns: There was a brief discussion about the Triton autotuner, with a member encouraging trying out the logging function, although there were concerns about code divergence due to other recent changes to the autotuner.
Understanding Triton Autotune with NCU Profiling: A member inquired about the autotune functionality of Triton, asking if all configs are compiled and run for each specific input shape and the implications for profiling with tools like ncu.

CUDA MODE ▷ #cuda (20 messages🔥): 

CUTLASS Stream-K Scheduling Clarification: A discussion clarified that Stream-K is a load balancing scheme used in load balancing tiles for GEMM implementation, and it's conceptually independent of the rest of CUTLASS. It was mentioned that covering Stream-K could fit as a short section of a future talk but explaining the entire CUTLASS 2.0 API might be extensive.

Optimizing Element-Wise Operations on CUDA: For optimizing element-wise operations, suggestions included using fusions for multiple operations, applying tricks from a referenced lecture 8 such as coarsening and hierarchical splitting, and utilizing thrust::for_each or thrust::transform from Thrust which could potentially achieve up to 90% of saturating bandwidth. A link to Thrust documentation was provided for reference.

Links mentioned:

Modifying: CUDA C++ Core Libraries
Transformations: CUDA C++ Core Libraries

CUDA MODE ▷ #torch (2 messages): 

Troubleshooting Dynamic Shapes with PyTorch Compile: A member suggests running PyTorch with the log option TORCH_LOGS="+dynamic" to diagnose errors indicating that inputs_ids.shape[1] == 7 was expected for dynamic shapes. This log setting can shed light on whether user code or PyTorch framework code necessitated shape specialization during tracing.

Issue Posted on PyTorch GitHub: A member has created an issue with a minimal example on PyTorch's GitHub, concerning the combination of Compile with DDP (Distributed Data Parallel) & dynamic shapes. The issue can be explored and additional information can be provided through the link: pytorch/pytorch #125641.

Link mentioned: : Tensors and Dynamic neural networks in Python with strong GPU acceleration - Issues · pytorch/pytorch

CUDA MODE ▷ #algorithms (2 messages): 

Boosting Transformer Efficiency: A post by @p_nawrot introduces Dynamic Memory Compression (DMC), a method to compress the KV cache in Large Language Models (LLMs) without sacrificing their performance, significantly improving throughput by up to 370% on a H100 GPU. They provided a link to the paper and stated that the code and models will be released soon.

Questioning Quantization: In response to the KV cache compression technique, a member inquired whether the approach involves quantizing the model. No additional information or response was provided regarding this query.

Link mentioned: Tweet from Piotr Nawrot (@p_nawrot): The memory in Transformers grows linearly with the sequence length at inference time.  In SSMs it is constant, but often at the expense of performance.  We introduce Dynamic Memory Compression (DMC) w...

CUDA MODE ▷ #beginner (7 messages): 

Beginner's GPU Clock Speed Confusion: A member was confused about the clock speed of H100 GPUs, stating it as 1.8 MHz and asking why the theoretical peak for fp64 is 34 TFLOPs. The difference pointed by another could imply a common error between MHz and GHz.

Correcting GPU Clock Speed Assumptions: In the follow-up, the same member acknowledged the likelihood that they meant 1.8 GHz, a significantly higher clock speed than initially stated, yet could not find a source confirming this for H100 GPUs.

Inquiry on PyTorch Torch Compile for Triton: A member asked how to call torch.compile for the Triton backend when using PyTorch, unsure whether backend="inductor" is the correct option.

Improving Model Performance in PyTorch: The discussion touched on using BetterTransformer with torch.compile to optimize models in PyTorch. The member shared code snippets illustrating how to implement these optimizations in the Hugging Face ecosystem.

CUDA MODE ▷ #pmpp-book (4 messages): 

Confusion Over Tiling in Transpose: A discussion in Chapter 5 exercises questioned the need for tiling in matrix transpose operations. Another member clarified that the purpose is to ensure coalesced memory write and shared a valuable NVIDIA blog post for better understanding.

Anticipated Knowledge Can Cause Puzzles: Members noted that the book exercises sometimes include questions on topics not yet covered, leading to confusion. It’s indicated the topic of coalescing is expected to be discussed in a subsequent chapter.

Link mentioned: An Efficient Matrix Transpose in CUDA C/C++ | NVIDIA Technical Blog: My last CUDA C++ post covered the mechanics of using shared memory, including static and dynamic allocation. In this post I will show some of the performance gains achievable using shared memory.

CUDA MODE ▷ #youtube-recordings (4 messages): 

Seeking the Next Video Meetup: A member inquired about the schedule for the next video meetup.
Navigating to Event Information: They were directed to find the video meetup details under the Events section, identifiable by the calendar icon.

CUDA MODE ▷ #jax (1 messages): 

Exploring Multi-Chip Model Training: A member shared a blog post discussing the necessity of training machine learning models over multiple chips. It focuses on efficiently using Google's TPUs, particularly on Google Cloud, with a visual example of layerwise matrix multiplication for workload distribution.

Link mentioned: Multi chip performance in JAX: The larger the models we use get the more it becomes necessary to be able to perform training of machine learning models over multiple chips. In this blog post we will explain how to efficiently use G...

CUDA MODE ▷ #off-topic (8 messages🔥): 

Metal Memory Clarification Sought: A member sought clarification on Metal & Apple Silicon, particularly on how to allocate buffers in context to Shared/Tile or Global/Unified/System Memory similar to CUDA's __shared__ and float global_array. They found their answer in Apple's documentation.
Metal Memory Access Query: The same member asked whether all GPU threads can access .memoryless and .private memory categories in Metal on Apple Silicon, redirecting from a previous question about memory allocation.
Inquiry on Lightning AI Studio Feedback: A user inquired if anyone had experience with Lightning AI Studio and if they could provide feedback.
Triton Language Presentation Proposal: Discussion included a proposal for a presentation on OpenAI's Triton language and its application to ML inference, situated alongside clarifications to distinguish it from Nvidia's Triton Inference Server.
Reference to CUDA Mode YouTube for Triton Talk: A member redirected others to CUDA MODE’s YouTube channel for a previous talk on Triton, which might include related content to the proposed presentation idea.

Link mentioned: GitHub - openai/triton: Development repository for the Triton language and compiler: Development repository for the Triton language and compiler - openai/triton

CUDA MODE ▷ #irl-meetup (1 messages): 
glaxus_: Anyone going to be at MLSys?

CUDA MODE ▷ #llmdotc (133 messages🔥🔥): 

Multi-GPU Training Hangs on Master: An issue has been raised regarding multi-GPU training hanging on the master branch, presumably due to the introduction of cudastreams. The problem is documented in a GitHub issue #369 and contributors are looking into it, with some suggesting that self-hosted runners on GitHub could be used for real GPU testing in CI to avoid such issues.

Nsight Systems for Performance Analysis: A link to NVIDIA Nsight™ Systems was shared for analyzing application algorithms and identifying optimization opportunities for CPUs and GPUs. The tool allows visualization of system workload on a timeline and can be used for both local and remote profiling as indicated by the Nsight Systems download link for macOS.

Addressing Kernel Synchronization for Performance: Suggestions have been made for the synchronization of kernels with train files, particularly converting all to floatX and standardizing documentation to make it easier for new collaborators. An example pull request, #319, demonstrates what the synchronization could look like.

Fine-Tuning CUDA Kernels for Better Performance: There is an active discussion on the importance of finely optimizing CUDA kernels for performance improvements on specific GPU architectures, particularly for memory-bound operations. For instance, Gelu_backward appears to be memory-bound and could benefit from reworking that avoids needless functions, as suggested in PR #307.

HuggingFace Introduces FineWeb for Pretraining: HuggingFace has released a dataset called FineWeb, boasting over 15 trillion tokens from cleaned and deduplicated English web data, optimized for LLM performance. The dataset's smallest subset is approximately 27.6GB, suitable for pretraining experiments with from-scratch models, and available at HuggingFace datasets.

Links mentioned:

Strangely, Matrix Multiplications Run Faster When Given "Predictable" Data!: Strangely, Matrix Multiplications Run Faster When Given "Predictable" Data! - mm_weird.py
When should I use CUDA's built-in warpSize, as opposed to my own proper constant?: nvcc device code has access to a built-in value, warpSize, which is set to the  warp size of the device executing the kernel (i.e. 32 for the foreseeable future). Usually you can't tell it apart ...
MultiGPU training hangs · Issue #369 · karpathy/llm.c: mpirun with multiple GPUs is hanging after allocated 474 MiB for master copy of params Most probably due to the introduction of cudastreams. @karpathy @PeterZhizhin
convert all float to floatX for layernorm_forward by JaneIllario · Pull Request #319 · karpathy/llm.c: change all kernels to use floatX
HuggingFaceFW/fineweb · Datasets at Hugging Face: no description found
utilities for mixed-precision tests/benchmarks by ngc92 · Pull Request #352 · karpathy/llm.c: This allows us to compile a single executable that can serve as test/benchmark for f32, f16, and bf16 versions of the kernels. So far, I've updated only those test files which already defined a BF...
Adding self-hosted runners - GitHub Docs: no description found
Optimisations for layernorm_backward / matmul_backward_bias / fused_classifier by ademeure · Pull Request #378 · karpathy/llm.c: These are fairly difficult optimisations to describe, hopefully the comments are helpful/enough! I'd focus on the changes in train_gpt2.cu rather than the similar ones in /dev/cuda/ (I didn't ...
Improve tanh derivative in backward gelu by akbariyeh · Pull Request #307 · karpathy/llm.c: It is cheaper to compute the derivative of tanh as 1 - tanh^2 than computing 1/(cosh^2). This will probably not make a measurable difference.
cuda::memcpy_async: CUDA C++ Core Libraries
NVIDIA Nsight Systems: Profile systems, analyze performance, and optimize platforms.
no title found: no description found

CUDA MODE ▷ #oneapi (4 messages): 

GitHub Pull Request Sighted: A member shared a GitHub pull request for PyTorch.org. The pull adds an accelerators dropdown to the quick start table with options for Huawei Ascend, Intel Extension for PyTorch, and Intel Gaudi. 

PyTorch.org Preview Unveiled: A preview link to PyTorch.org was shared, highlighting the PyTorch Conference 2024 call for proposals, early bird registration, new features in PyTorch 2.3, membership information, and ecosystem details. The preview also emphasizes TorchScript, TorchServe, torch.distributed backend, and cloud platform support for PyTorch.

Links mentioned:

      PyTorch

  : no description found
Add accelerators to quick start table by aradys · Pull Request #1596 · pytorch/pytorch.github.io: Create accelerators dropdown with following options and add it to quick start table:   Huawei Ascend Intel Extension for PyTorch Intel Gaudi   Add commands to previous versions section  RFC: pytorc...

OpenAI ▷ #annnouncements (1 messages): 

OpenAI Shares Its Data Doctrine: OpenAI has outlined its approach to content and data in the age of AI. This crucial document details how they handle the vast amount of data in today's landscape and what ethical principles they adhere to.

OpenAI ▷ #ai-discussions (87 messages🔥🔥): 

Exploring AI's Music Chops: A member mentioned musician Davie504 jamming with AI music, suggesting the quality of AI music has become compelling.
Perplexity: A Hidden Gem in AI: A member expressed astonishment upon discovering Perplexity, regretting not using it sooner and stating, "why haven't I used this sooner."
Cosine Similarity Thresholds in Embedding Models: One member inquired about the appropriate cosine similarity threshold for the model text-embedding-3-small, noting that the "old 0.9" threshold might equate to "new 0.45" in newer models.
Local LLM Model Recommendations for 8GB VRAM: When a member sought recommendations for an LLM model suitable for an 8GB VRAM card, it was suggested that Llama8B runs well and can be tried using LM Studio.
AI News and Updates: Members shared different sources for staying updated on AI trends with suggestions including engaging with community members, following relevant Twitter accounts, and browsing sites like OpenAI Community and Ars Technica.

Links mentioned:

OpenAI Developer Forum: Ask questions and get help building with the OpenAI platform
Ars Technica: Serving the Technologist for more than a decade. IT news, reviews, and analysis.

OpenAI ▷ #gpt-4-discussions (15 messages🔥): 

Seeking GPT Knowledge Prioritization: A member expressed difficulty with GPT's preference for general answers rather than accessing specific entries in its knowledge base. It was suggested that telling GPT to refer to its "Analyze utility" might prompt a more accurate response.

GPT's Knowledge Base Explained: Clarification was provided on how GPT's knowledge base works, with the process described as splitting knowledge into chunks, converting them into mathematical vectors, and searching these within a vector database to match prompts.

The Challenge of Vector Visualization: There was a brief discussion on the complexities of visualizing vectors in the context of GPT's knowledge base, noting the challenge due to the vectors being in 256 dimensions.

Uniform GPT-4 Performance Across Demand: Another member noted that GPT-4's performance does not vary with demand, everyone gets the same "turbo model" regardless of usage.

No Inferior Models to Handle Demand: A final comment debunked the idea that an inferior model is used to manage demand, suggesting it's more cost-effective to invest in servers rather than a less desired model.

OpenAI ▷ #prompt-engineering (33 messages🔥): 

Struggle with Twitter Data: A member tried using RAG for incorporating Twitter data as knowledge for the LLM but found the model lost generalization. A suggestion was to explore Custom GPT or web-browser based GPT solutions that may support such functionality today.
Negative Prompting Pitfalls: There was discussion regarding the difficulty of integrating negative examples into prompt engineering, with suggestions to consider it as an "advanced prompt engineering" technique due to its complex nature and possible impact on response quality.
Prompt Expansion Advice for Product Identification: For complex tasks such as identifying product details in supermarket brochures, it was advised to split the workload into multiple prompts instead of one, using outputs from previous prompts as inputs for subsequent ones for better results.
Challenges with DALL-E Prompt Specificity: Members discussed DALL-E's struggle with negative prompts, noting that specifying what not to include, e.g., "no avocado pits", can lead to confusion. A channel with experienced DALL-E users was recommended for further advice.
Scripting for Improved Responses with Logit Bias: To achieve more consistent output without random bullet points, it was proposed to use output templates with open variable names in pseudocode format, and for more control, investigate logit bias which requires following a procedure in the provided link.

OpenAI ▷ #api-discussions (33 messages🔥): 

GPT Models Struggle with Tailored Responses: A member asks for advice on incorporating Twitter data into ChatGPT's knowledge to address specific prompts. They express difficulty with the LLM model using RAG and its inability to respond to inquiries beyond Twitter data.
Prompt Engineering Best Practices: Madame_architect recommends a LinkedIn Learning course by Ronnie Sheer, Prompt Engineering with ChatGPT, for learning basics to advanced techniques, including Socratic prompting and Dall-E interactions.
DALL-E's Difficulty with Negative Prompts: It is discussed that DALL-E often struggles with negative prompts, for instance not including avocados with pits, even when explicitly mentioned in the prompt as unwanted.
Tackling Unwanted Tokens via Logit Bias: Link to the (OpenAI logit bias guide) is shared to address the issue of an AI producing inconsistent outputs with unwanted tokens.
Step-wise Improvement in API Prompt Responses: Madame_architect shares a multi-step approach to improve the response from GPT API for analyzing and formatting product information, suggesting separating vision tasks and format structuring into distinct API calls.

Eleuther ▷ #general (39 messages🔥): 

ICLR Attendance with a Focus on AI Interpretability: A member mentioned they would be attending ICLR, with work focused on Interpretability in generative AI, particularly on the vision side and 3D vision.

Greeting from Thailand: A simple Hello was shared by a member from Thailand.

Testing the Impact of System Prompts on Model Performance: A member expressed interest in using lm-evaluation-harness to evaluate how varying system prompts affect a model's performance, using models such as llama3 8b instruct, and inquired on how to specify system prompts for Hugging Face models in the harness.

Italian Language Leaderboard in lm-eval: A member is evaluating Large Language Models (LLMs) in Italian and maintaining a leaderboard using lm-evaluation-harness.

Query on Obtaining MMLU Data: A student member asked about acquiring granular MMLU data for any closed-source model, as they exhausted their eval credits on attempting to evaluate GPT-4. They mention finding a CSV file but it lacked model-specific answers.

Discussion on PEFT with Available VRAM: Members discussed whether Prompt-based Efficient Fine-Tuning (PEFT) using LoRA would be beneficial when sufficient VRAM is available, with some insight that perhaps the implementation could be suboptimal or specific configurations may affect performance, such as mixed precision settings and the R value in LoRA.

Eleuther ▷ #research (77 messages🔥🔥): 

Evaluating Scientific Standards: A member discussed that accepted scientific community standards, such as the .05 p-value threshold, may seem arbitrary but served practical purposes historically. The Nature article Lowering the P-Value Threshold was shared, advocating for a shift from 0.05 to 0.005 to improve the reproducibility of scientific findings.

The P-Value Debate Continues: Conversations ensued highlighting the arbitrary nature of the p-value threshold, with references to its historical context involving Fisher, Neyman, and the development of statistical tests. Discrepancies between physics and other fields, the influence of data on outcomes, and misunderstandings of uncertainty by the general public were key points of discussion.

SOPHON Framework Introduced: A new learning paradigm called non-fine-tunable learning was shared, which aims to prevent pre-trained models from being misused for unethical tasks. The SOPHON framework is designed to protect pre-trained models from being fine-tuned in restricted domains.

QuaRot's Novel Quantization Scheme: A link was provided to QuaRot, a new quantization approach for language models (LLMs) detailed in an arXiv paper, which claims to maintain performance while applying comprehensive 4-bit quantization to weights, activations, and KV cache.

Mixture-of-Experts Architecture Lory Unveiled: Discussion on a new MoE model called Lory was introduced, with a focus on a differentiable architecture pre-trained on 150B tokens. It includes novel techniques for causal segment routing and similarity-based data batching for expert specialization.

Links mentioned:

CameraCtrl: no description found
SOPHON: Non-Fine-Tunable Learning to Restrain Task Transferability For Pre-trained Models: Instead of building deep learning models from scratch, developers are more and more relying on adapting pre-trained models to their customized tasks. However, powerful pre-trained models may be misuse...
Lady tasting tea - Wikipedia: no description found
Significant: no description found
The mechanistic basis of data dependence and abrupt learning in an...: Transformer models exhibit in-context learning: the ability to accurately predict the response to a novel query based on illustrative examples in the input sequence, which contrasts with...
Lory: Fully Differentiable Mixture-of-Experts for Autoregressive Language Model Pre-training: Mixture-of-experts (MoE) models facilitate efficient scaling; however, training the router network introduces the challenge of optimizing a non-differentiable, discrete objective. Recently, a fully-di...
Redefine statistical significance - Nature Human Behaviour: We propose to change the default P-value threshold for statistical significance from 0.05 to 0.005 for claims of new discoveries.
QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs: We introduce QuaRot, a new Quantization scheme based on Rotations, which is able to quantize LLMs end-to-end, including all weights, activations, and KV cache in 4 bits. QuaRot rotates LLMs in a way t...

Eleuther ▷ #scaling-laws (1 messages): 
nullonesix: https://arxiv.org/abs/2102.01293

Eleuther ▷ #interpretability-general (34 messages🔥): 

Exploring Skip Connections: A member is experimenting with adaptive skip connections where weights reduce and can even become negative during training, resulting in improved loss metrics compared to a standard model. They provided experimental results and are asking for related research on this phenomenon.

Investigating Weight Dynamics: Another member shared a related paper on gating the residual path https://arxiv.org/pdf/2003.04887, although it differs from the experiment in not restricting the identity component to positive values.

Code and Clarifications: Further clarification was sought on what was meant by "identity component" in the experiments. The member shared code that reveals the use of a single weight on the residual connection in a transformer layer.

Dataset and Model Details Revealed: The models used in the experiments have 607M parameters and were trained on the fineweb dataset with a batch size of 24 and a learning rate of 6e-4 on a context window of 768, all run on a single A100.

Discussion on Training Speed and Odd Loss Curves: A member discussing the experiment noted that the loss curve seemed odd and was going down too slowly, while another suggested trying the experiment on the OpenWebText dataset to compare results.

Links mentioned:

train_loss (24/05/07 01:06:58): Publish your model insights with interactive plots for performance metrics, predictions, and hyperparameters. Made by Nick Ryan using Weights & Biases
gist:08c059ec3deb3ef2aca881bdc4409631: GitHub Gist: instantly share code, notes, and snippets.

Eleuther ▷ #lm-thunderdome (11 messages🔥): 

Logits Support Still Unavailable: Despite interest in logit access for model understanding, API models like OpenAI's still do not support logits or the use of logit biases after recent research indicated the potential to extract a model's "image" or "signature" from them (logit extraction work). This affects the ability to perform evaluations based on log likelihoods and has led to the omission of log probs for input tokens in API responses.
Evaluation Workaround for Closed Models: Encouraging attempts to evaluate Italian LLMs against models like GPT-3.5 were shared, including a YAML setup for utilizing 'generate_until' instead of logits for tasks like MMLU and ARC, indicating a workaround for external evaluations on closed models.
OpenAI Docs Suggests Logprob Return: A member noted that OpenAI's documentation implies the return of logprobs, highlighting the complexity in running external evaluations on proprietary models.
Clarification on Logprob Availability: It was clarified that the limitation is the absence of logprobs for prompt/input tokens, which is crucial for calculating the completeness of a multiple-token response from models in evaluation scenarios.

Links mentioned:

Stealing Part of a Production Language Model: We introduce the first model-stealing attack that extracts precise, nontrivial information from black-box production language models like OpenAI's ChatGPT or Google's PaLM-2. Specifically, our...
Logits of API-Protected LLMs Leak Proprietary Information: The commercialization of large language models (LLMs) has led to the common practice of high-level API-only access to proprietary models. In this work, we show that even with a conservative assumption...
lm-evaluation-harness/lm_eval/tasks/mmlu/flan_cot_zeroshot/_mmlu_flan_cot_zeroshot_template_yaml at main · EleutherAI/lm-evaluation-harness: A framework for few-shot evaluation of language models. - EleutherAI/lm-evaluation-harness

Modular (Mojo 🔥) ▷ #general (39 messages🔥): 

Installation Inquiry: A user inquired about how to install mojo on Intel Mac OS via Docker, but no direct solution was provided.
Mojo on Windows: Users discussed alternatives to using mojo on Windows, with a suggestion to utilize WSL2 for a comparable experience. Patience was expressed humorously through a gif implying a wait for native Windows support.
Discussion on Mojo's Design Choices: Members debated the rationale behind having both structs and classes in mojo, touching on design patterns, and addressing concerns over the decision to include both constructs.
Mojo's Compilation Capabilities: There was clarification that mojo can compile to native machine code, similar to Rust, and can produce executable files like .exe. Concerns about the speed of interpretation in data engineering tasks, compared to the immediacy of languages like Python, were addressed.
Python Integration with Mojo: Users explained how mojo integrates with Python by importing modules and calling functions, referencing official documentation to demonstrate that mojo aims to be a superset of Python that leverages the established Python ecosystem while introducing its own functionalities.

Links mentioned:

Its Gonna Take A While Steve Kornacki GIF - Its Gonna Take A While Steve Kornacki Msnbc - Discover & Share GIFs: Click to view the GIF
Python integration | Modular Docs: Using Python and Mojo together.

Modular (Mojo 🔥) ▷ #💬︱twitter (2 messages): 

Modular Shares Two Important Updates: The Modular team tweeted important updates shared via two separate Twitter posts.
Catch the Latest From Modular: For more information, follow the provided links to Tweet 1 and Tweet 2 on Modular's official Twitter page.

Modular (Mojo 🔥) ▷ #📺︱youtube (1 messages): 

Modular Community Livestream Alert: A new video titled "Modular Community Livestream - New in MAX 24.3" has been shared, announcing the availability of MAX 24.3. The livestream delves into the latest updates on the MAX Engine and introduces the new Extensibility API for Mojo🔥. Watch now.

Link mentioned: Modular Community Livestream - New in MAX 24.3: MAX 24.3 is now available! Join us on our upcoming livestream as we discuss what’s new in MAX Engine and Mojo🔥 - preview of MAX Engine Extensibility API for...

Modular (Mojo 🔥) ▷ #🔥mojo (53 messages🔥): 

Troubleshooting Tensor Indexing: A user encountered an error when setting the value of a tensor at a specific location using indexing, which was resolved by using the Index utility from utils.index. Code from a tensor library demonstrates setting tensor values at specified indices correctly without errors.

Optimizing SIMD Operations on Large Arrays: A member grappled with using SIMD for large arrays and experienced compilation issues with arrays beyond size 2^14. Another user advised utilizing smaller SIMD blocks and provided example code from Mojo's GitHub repository.

Benchmarking Dilemmas: A user raised concerns about benchmarking functions where computed values are not used, potentially causing optimizations that eliminate the function itself. The issue was addressed by suggesting the use of benchmark.keep to retain results and prevent such optimizations.

Curiosity on Constructors Without Class Inheritance: A member questioned the need for constructors in Mojo, as the language lacks classes and inheritance. The conversation touched upon the differentiation between Mojo and other languages like Go and Rust, emphasizing constructors as a means to ensure instances are valid upon creation.

Exploration of Advanced Compiler Tools: Participants in the chat discussed the necessity for tools to reveal compiled code details, such as LLVM IR, to eliminate guesswork. Interest was expressed in tools equivalent to compiler explorer for Mojo or the ability to view lower-level representation of compiled Mojo code.

Proposal for 'where' Clause in Mojo: A discussion took place regarding a GitHub proposal for parameter inference in Mojo functions, debating the implementation and readability of right-to-left inference rules versus the use of where clauses similar to those found in mathematics and the Swift programming language. Participants shared differing opinions on the most intuitive approach for specifying parameter constraints.

Compile-time Metaprogramming in Mojo: A user inquired about the extent of compile-time metaprogramming in Mojo, particularly whether calculations like the Fibonacci sequence could be performed at compile time. Confirmation was given that such computations are indeed possible, with the caveat that there should be no side effects involved.

Links mentioned:

transpose | Modular Docs: transpose(input Int, y: Int) -> Symbol
[Feature Request] Parameter Inference from Other Parameters · Issue #1245 · modularml/mojo: Review Mojo's priorities I have read the roadmap and priorities and I believe this request falls within the priorities. What is your request? Mojo already supports inferring parameters from argume...
basalt/basalt/utils/tensorutils.mojo at main · basalt-org/basalt: A Machine Learning framework from scratch in Pure Mojo 🔥 - basalt-org/basalt
mojo/proposals/inferred-parameters.md at main · modularml/mojo: The Mojo Programming Language. Contribute to modularml/mojo development by creating an account on GitHub.
mojo/examples at main · modularml/mojo: The Mojo Programming Language. Contribute to modularml/mojo development by creating an account on GitHub.
mo - Overview: mo has 49 repositories available. Follow their code on GitHub.

Modular (Mojo 🔥) ▷ #community-projects (14 messages🔥): 

Mojo-sort Updated and Enhanced: The mojo-sort project has been updated to work with the latest Mojo nightly. It now includes a more efficient radix sort algorithm for strings, boasting faster speeds across all benchmarks.
Help Needed with Lightbug Migration: The Lightbug project is facing issues migrating to Mojo version 24.3, particularly concerning errors that appear to log EC2 locations. Assistance is requested by the developers, with details documented in this GitHub issue.
Basalt Navigates Mojo's Limitations: The Basalt project adapts to Mojo's current limitations, like the lack of classes and inheritance, by finding workarounds such as using StaticTuple for compile-time lists, but it generally hasn't limited the overall goals.
A New Port of Minbpe to Mojo: Minbpe.mojo, a Mojo port of Andrej Kathpathy's Python project, has been released. Although currently slower than its Rust counterpart, it runs three times faster than the original Python version, and there's potential for optimization, including possible future SIMD implementations.
Mojo GUI Library Inquiry: A member expressed interest in finding out if a Mojo GUI library exists, to which there has been no response within the given messages.

Links mentioned:

Client tests don't work with changes in Mojo 24.3 · Issue #34 · saviorand/lightbug_http: Since Mojo 24.3 main() functions inside packages are no longer supported. This was used in /tests/run.mojo to run a test suite (which is just one client test for now). The client test worked by run...
GitHub - dorjeduck/minbpe.mojo: port of Andrjey Karpathy's minbpe to Mojo: port of Andrjey Karpathy's minbpe to Mojo. Contribute to dorjeduck/minbpe.mojo development by creating an account on GitHub.
GitHub - mzaks/mojo-sort: Contribute to mzaks/mojo-sort development by creating an account on GitHub.
mojo-pytest/example/tests/util.mojo at main · guidorice/mojo-pytest: Mojo test runner, pytest plugin (aka pytest-mojo). Contribute to guidorice/mojo-pytest development by creating an account on GitHub.

Modular (Mojo 🔥) ▷ #nightly (16 messages🔥): 

Type Handling Woes in Traits and Variants: Discussions around handling return types in a programming context surfaced, mentioning the need for a Never type or macro trick similar to Rust to pass return type checks. Some members see current solutions as band-aids, emphasizing the need for more explicit language constructs like keywords.

Variant Types Challenged by Trait Inheritance: A member is running into issues with creating variant types with trait inheritance, highlighting a current limitation flagged in the associated GitHub issue. They are also exploring alternative methods like PythonObject with @staticmethods to avoid using UnsafePointer.

Nightly Compiler Release & Reference Ergonomics Improvement: The Mojo Discord announces a new nightly release of the Mojo compiler along with enhancements to the ergonomics of Reference, which have simplified understanding and usage.

Request for Automating Nightly Release Notifications: A question about automating messages for nightly releases led to a response from a member stating that while it is possible, it has not been prioritized enough to implement yet.

Monitors Stretched by the Latest Update: A user humorously comments that their 2k monitor is just enough to handle the scope of the latest updates, implying the significance of the changes.

Links mentioned:

[Feature Request] Allow substitution of child traits for parent traits · Issue #2413 · modularml/mojo: Review Mojo's priorities I have read the roadmap and priorities and I believe this request falls within the priorities. What is your request? If a function takes variadic arguments bound by a trai...
[stdlib] Update stdlib corresponding to 2024-05-06 nightly/mojo by JoeLoser · Pull Request #2559 · modularml/mojo: This updates the stdlib with the internal commits corresponding to today's nightly release: mojo 2024.5.622.
mojo/docs/changelog.md at nightly · modularml/mojo: The Mojo Programming Language. Contribute to modularml/mojo development by creating an account on GitHub.
[stdlib] Update stdlib corresponding to 2024-05-06 nightly/mojo by JoeLoser · Pull Request #2559 · modularml/mojo: This updates the stdlib with the internal commits corresponding to today's nightly release: mojo 2024.5.622.

OpenRouter (Alex Atallah) ▷ #announcements (1 messages): 

Soliloquy L3 Price Drop: The price for using Soliloquy L3 8B model has been reduced to $0.05/M tokens for both private and logged endpoints for the year 2023 - 2024.

Link mentioned: Lynn: Llama 3 Soliloquy 8B v2 by lynn | OpenRouter: Soliloquy-L3 v2 is a fast, highly capable roleplaying model designed for immersive, dynamic experiences. Trained on over 250 million tokens of roleplaying data, Soliloquy-L3 has a vast knowledge base,...

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages): 

Beta Testers Wanted for Rubik's AI: Users are invited to become beta testers for Rubik's AI, a new advanced research assistant and search engine. Potential testers will receive 2 months free of premium features including access to Claude 3 Opus, GPT-4 Turbo, Mistral Large, and other cutting-edge models, by signing up at rubiks.ai with the promo code RUBIX.
Tech World Buzz - Apple and Microsoft Innovations: In the app's trending topics, there are updates on Apple's latest iPad models for 2024 and news about Microsoft developing a 500b parameter model named MAI-1, competing with OpenAI. Users can access these stories within the app for detailed insights.

Link mentioned: Rubik's AI - AI research assistant & Search Engine: no description found

OpenRouter (Alex Atallah) ▷ #general (119 messages🔥🔥): 

Search for Elusive Verbose Llama: Members discussed challenges in getting llama-3-lumimaid-8b to generate longer responses, comparing its shorter outputs unfavorably with previous experiences from models like Yi and Wizard.
Meta-Llama-3-120B-Instruct Reveal: Conversation about the launch of Meta-Llama-3-120B-Instruct on Hugging Face, a model inspired by large self-merges such as Goliath-120b and MegaDolphin-120b; member shared link to a tweet regarding its release.
Amazon Bedrock’s Model Request Regional Restrictions: A user questioned the possibility of regional restrictions when requesting models on Amazon Bedrock from different billing regions, with others suggesting that requesting access to another region seems possible.
OpenRouter’s Precision Preference: Dialogue on whether OpenRouter runs models at full precision becomes clear it varies with provider, mostly at fp16, and sometimes quantized to int8.
Model Optimization and Parameter Tuning: Inquiry about whether OpenRouter's default parameters for models are typically suitable, eliciting opinions and experiences on the need for parameter adjustments for better conversational outcomes.

Links mentioned:

no title found: no description found
mlabonne/Meta-Llama-3-120B-Instruct · Hugging Face: no description found
What are the different AI models in AI Dungeon?: no description found
Reddit - Dive into anything: no description found
Reddit - Dive into anything: no description found
abacusai/Fewshot-Metamath-OrcaVicuna-Mistral-10B · Hugging Face: no description found
OpenRouter: Build model-agnostic AI apps

OpenInterpreter ▷ #general (39 messages🔥): 

Troubleshooting Local Interpreter Issues: A user experienced an error when running Interpreter with Mixtral locally. Another sought to understand the capabilities of OpenInterpreter, comparing their personal experience with the project's impressive demo video.
Phi Performance Evaluations: A discussion took place regarding the use of a Hugging Face model (Phi-3-Mini-128k-Instruct) on OpenInterpreter; Mike mentioned having had "very poor performance" using Phi.
Interest in Benchmarks for Model Performance: Members are interested in benchmarking various models to see which work best with Open Interpreter, suspecting that models with less synthetic data may perform better.
Calling for Model and Framework Nicknames: Suggestions for nicknaming model+framework combinations arose after a user shared positive feedback on their experience with Gemma-over-Groq (GoG).
Request for Custom System Instructions: There was a call for assistance regarding Custom/System Instructions for Ubuntu using GPT-4, indicating a need in the community for such resources.
Recommendation for Pre-Fine Tuning: In response to a question about training models for a specific task, it was suggested to use OpenPipe.ai, also mentioning the importance of understanding and using system messages effectively prior to fine-tuning a model.

Links mentioned:

OpenPipe: Fine-Tuning for Developers: Convert expensive LLM prompts into fast, cheap fine-tuned models.
microsoft (Microsoft): no description found
What Hello GIF - What Hello Yes - Discover & Share GIFs: Click to view the GIF
What is Reka Core?:  **Reka Core** is a frontier-class, multimodal language model developed by Reka. It is one of only two commercially available comprehensive multimodal solutions, capable of processing and understandin...
microsoft/Phi-3-mini-128k-instruct · Hugging Face: no description found

OpenInterpreter ▷ #O1 (80 messages🔥🔥): 

Open Interpreter Version Specificity: It was highlighted that Open Interpreter (OI) should run under Python 3.10 to avoid compatibility issues. One user encountered slow performance when running Groq and was advised to switch to a smaller model, such as dolphin or mixtral, for better experience.

Conda for Clean Installations on Mac: Users discussed creating a separate Conda environment with Python 3.10 for a fresh installation of Open Interpreter, after facing multiple version conflicts with Python and errors.

Local Model Framework Inquiry: A user inquired about supporting Jan as a local model framework for the O1 device, similar to its use with Open Interpreter (text). It was affirmed that there should be no issues so long as the models are served similarly.

01 Device International Availability: Concerning the 01 device's service availability, it was noted that while the device works anywhere with internet access, the hosted service is likely US-only at present, and no units have been shipped out yet.

LLM Skill Persistence and Execution Issues: A user noted the importance of persisting learned skills to avoid re-teaching the same information to language models. They were directed to the Open Interpreter's GitHub for information on how skills persist in storage. Additionally, issues with running code using a smaller language model were mentioned and addressed in a video reference.

Links mentioned:

open-interpreter/interpreter/core/computer/skills/skills.py at main · OpenInterpreter/open-interpreter: A natural language interface for computers. Contribute to OpenInterpreter/open-interpreter development by creating an account on GitHub.
microsoft/Phi-3-mini-128k-instruct · Hugging Face: no description found

OpenAccess AI Collective (axolotl) ▷ #general (35 messages🔥): 

Sora's Open Source Alternative Launched: An open-source alternative to Sora with an MIT license has been released on Github, providing a tool for creating magic stories. However, it's noted that the weights are not released yet. Discover the tool here.
Gradient Checkpointing Memory Savings: A member reported successful use of the new unsloth gradient checkpointing, seeing a reduction in VRAM usage from 19,712MB to 17,427MB, indicating significant memory savings.
Unusual Training Patterns in Model Layers: During a discussion, a puzzling finding was pointed out where only a single slice of a model layer seemed to be getting trained, in contrast with other models where layers were fully trained, leading to some speculation but no concrete explanation.
Speculation on Lazy Optimizer Behavior: There was speculation that if a dataset is too easy, the model may not optimize all layers and focus only on the first few and the last layers for efficiency, yet anomalies in layer training intensity were not fully explained by this theory.
Axolotl Features Inquiry: A member inquired about the latest features in Axolotl after having not trained in a while, seeking updates on new capabilities or enhancements.

Link mentioned: GitHub - HVision-NKU/StoryDiffusion: Create Magic Story!: Create Magic Story! Contribute to HVision-NKU/StoryDiffusion development by creating an account on GitHub.

OpenAccess AI Collective (axolotl) ▷ #other-llms (1 messages): 
icecream102: Coincidence?

OpenAccess AI Collective (axolotl) ▷ #general-help (5 messages): 

Identifying HuggingFace Dataset Types: When uncertain about the type of a HuggingFace dataset, the simplest method is to download and open it up to inspect the contents. Alternatively, one can check the dataset's preview for this information.

Building a Language-Specific LLM for Code: Inspired by IBM's granite models, a member shared an interest in creating a language-specific LLM, specifically for Java code assistance, with the intent to operate on a standard laptop without a GPU. They seek guidance on selecting a base model for fine-tuning, determining the right number of epochs, training size, and quantization to maintain accuracy.

OpenAccess AI Collective (axolotl) ▷ #datasets (38 messages🔥): 

Challenges in Fine-tuning for Mathematical Performance: Discussions highlighted a decrease in scores for mathematical topics, especially in the mmlu and Chinese math evaluations (ceval and cmmlu). The decline in performance was noted even after fine-tuning on datasets such as orca-math-word-problems-200k, math instruct, and metamathQA from Hugging Face.

Quantization's Impact on Model Performance: Members broached the topic of quantization effects, specifically referencing how llama.cpp's quantization might significantly degrade model performance.

Fine-tuning and Evaluation Strategies Discussed: Models were reportedly fine-tuned on datasets like orca-math-word-problems-200k, math instruct, and MetaMathQA, and evaluated using lm-evaluation-harness. However, some concern was raised regarding ensuring correct prompt template usage during evaluation and fine-tuning.

Prompt Design Can Influence Model Behavior: A pointed discussion took place about the importance of using correct prompt designs, as changes to templates, including potentially incorrect end-of-text tokens, could impact model performance.

Key Role of Prompt Formats in Fine-tuning: A participant argued that custom prompt formats like alpaca could be utilized if models are also fine-tuned with those examples, while recognizing the need for accurate comparisons and awareness of potential performance issues.

Links mentioned:

Meta Llama 3 | Model Cards and Prompt formats: Special Tokens used with Meta Llama 3. A prompt should contain a single system message, can contain multiple alternating user and assistant messages, and always ends with the last user message followe...
GitHub - EleutherAI/lm-evaluation-harness: A framework for few-shot evaluation of language models.: A framework for few-shot evaluation of language models. - EleutherAI/lm-evaluation-harness
Llama3 GGUF conversion with merged LORA Adapter seems to lose training data randomly · Issue #7062 · ggerganov/llama.cpp: I'm running Unsloth to fine tune LORA the Instruct model on llama3-8b . 1: I merge the model with the LORA adapter into safetensors 2: Running inference in python both with the merged model direct...
axolotl/src/axolotl/prompters.py at 3367fca73253c85e386ef69af3068d42cea09e4f · OpenAccess-AI-Collective/axolotl: Go ahead and axolotl questions. Contribute to OpenAccess-AI-Collective/axolotl development by creating an account on GitHub.
Llama3 GGUF conversion with merged LORA Adapter seems to lose training data randomly · Issue #7062 · ggerganov/llama.cpp: I'm running Unsloth to fine tune LORA the Instruct model on llama3-8b . 1: I merge the model with the LORA adapter into safetensors 2: Running inference in python both with the merged model direct...
TIGER-Lab/MathInstruct · Datasets at Hugging Face: no description found
meta-math/MetaMathQA · Datasets at Hugging Face: no description found

OpenAccess AI Collective (axolotl) ▷ #docs (2 messages): 

Expanded Documentation on Model Merging: A new documentation update has been made on Axolotl, addressing the merging of model weights. The next goal outlined is to tackle guidance on inference.
Axolotl: A Training Hub for AI Enthusiasts: The Axolotl GitHub repository offers a flexible tool for fine-tuning AI models, covering a wide range of Huggingface models and fine-tuning techniques, and highlights the ability to customize configurations.

Link mentioned: Introduction | Continuum Training Platform | Axolotl Training Platform: no description found

OpenAccess AI Collective (axolotl) ▷ #axolotl-phorm-bot (5 messages): 

Modifying System Prompts in ShareGPT: To change the system prompt for conversational training, one must adjust the conversation template in the ShareGPTPrompter class or the initial system message. This involves modifying the _build_result method or the corresponding configuration parameters.

Link mentioned: OpenAccess-AI-Collective/axolotl | Phorm AI Code Search: Understand code, faster.

LangChain AI ▷ #general (43 messages🔥): 

In Search of OData V4 Support: A member inquired about LangChain's support for Microsoft Graph (OData V4) and discussed the potential need to create a custom tool similar to the SQLDatabase utility.
API Wishes for Kappa Bot: A user expressed interest in an API for kappa-bot-langchain to use outside of Discord due to workplace restrictions on the platform.
Entity Memory Under the Microscope: A clarification was sought on the parameter k in ConversationEntityMemory, directing attention to the LangChain documentation for insight.
Framework Showdown for Python Newcomers: A newcomer to Python queried the community about choosing between Flask or Django for a new application, mentioning concerns about Flask’s scalability.
Seeking LangChain's Data Heft Limits: A user questioned the upload size limit for datasets within the Langsmith Free plan, specifying a concern with a 300MB CSV file.

Links mentioned:

Defining Custom Tools | 🦜️🔗 LangChain: When constructing your own agent, you will need to provide it with a list of Tools that it can use. Besides the actual function that is called, the Tool consists of several components:
Table with LLMs Token Output Limit: A table with information on Context Window and Maximum Output Tokens of multiple large language models (LLMs)
Use query parameters to customize responses - Microsoft Graph: Microsoft Graph provides optional query parameters that you can use to specify and control the amount of data returned in a response. Includes common parameters.
langchain.memory.entity.ConversationEntityMemory — 🦜🔗 LangChain 0.1.18: no description found
Rubik's AI - AI research assistant & Search Engine: no description found

LangChain AI ▷ #langserve (13 messages🔥): 

StreamEvents with RemoteRunnable Inquiry: A member questioned the possibility of using streamEvents with RemoteRunnable in LangChain. They were provided an affirmative answer along with a detailed code example and directed to the LangChain documentation and API reference.

RemoteRunnable Streaming Issues in JavaScript: The member later reported an issue where the JavaScript implementation of RemoteRunnable was not streaming via streamEvents as expected, despite working in Python. This suggests a potential inconsistency or problem with the JavaScript version.

Possible Misdirection to Incorrect API Endpoint: The conversation concluded with the member noting that the streamEvents method was making HTTP POST requests to /stream instead of /stream_events. The inconsistency prompted advice to raise an issue on the LangChain GitHub repository for clarification or to correct the potential bug.

Links mentioned:

Integrating with LangServe | 🦜️🔗 Langchain: LangServe is a Python framework that helps developers deploy LangChain runnables and chains
Interface | 🦜️🔗 Langchain: In an effort to make it as easy as possible to create custom chains, we've implemented a "Runnable" protocol that most components implement.

LangChain AI ▷ #share-your-work (5 messages): 

Introducing Everything-AI: everything-ai is rebranded to V1.0.0 and features a multi-task, AI-powered local assistant capable of conversing with PDFs, summarizing texts, generating images, and more. The project can be found on GitHub with a brand new user interface and quick-start documentation provided at https://astrabert.github.io/everything-ai.

Calling All Beta Testers: Rubiks.ai seeks beta testers for its advanced research assistant and search engine, offering 2 months free of premium access to models like Claude 3 Opus, GPT-4 Turbo, and Mistral Large. Interested parties are welcomed with a promo code and can sign up at https://rubiks.ai/.

No-Code Tool for AI Applications Unveiled: A new no-code tool designed to streamline the creation of AI applications and facilitate the transition from prototype to production has been introduced, complete with built-in prompt engineering and one-click deployment. The early demo is viewable at Google Drive demo, and feedback can be scheduled via booking link.

Discovering API Chain through Langchain Series: A tutorial on using APIChain for calling APIs with large language models (LLMs) is available in a video called "API Chain | Chain Types | Learning Langchain Series," found on YouTube.

Learn to Leverage Router Chain: Another part of the Learning Langchain Series covers Router Chain, a tool for managing multiple APIs and tasks with LLMs, which is explained in detail in the video "ROUTER CHAIN | Learning Langchain Series | Chain Types" on YouTube.

Links mentioned:

everything-ai: Introducing everything-ai, your fully proficient, AI-powered and local chatbot assistant! 🤖
GitHub - AstraBert/everything-ai: Introducing everything-ai, your multi-task, AI-powered and local assistant! 🤖: Introducing everything-ai, your multi-task, AI-powered and local assistant! 🤖 - AstraBert/everything-ai
Rubik's AI - AI research assistant & Search Engine: no description found
API Chain | Chain Types | Learning Langchain Series | Become an expert in calling APIs with LLMs!: Learn how to call an API using APIChain from langchain. You will see that with the help of this library, you'll be at the forefront of delivering values when...
Pixie.ai.mp4: no description found
Appointments: no description found
ROUTER CHAIN | Learning Langchain Series | Chain Types | Route between your LLMs in a fashion way!: Router chain is one of the tools you'll definitely need when dealing with multiple tasks! Imagine how to handle multiple APIs or multiple tasks among multipl...

LangChain AI ▷ #tutorials (1 messages): 
mhadi91: https://youtu.be/WTfWgYsIspE?si=gEdyMrX4vJm2gC6E

LAION ▷ #general (61 messages🔥🔥): 

Exploring Roleplay Datasets for AI: A member suggested creating a dataset composed of exclusively human-written dialogue, including jokes and human interactions, to improve AI model responses beyond the typical soulless outputs of smart instruct models.
Synthetic Data Generation for Research: A user shared a GitHub link to Simian, a synthetic data generator for image, video, and 3D models, offering a possible resource for experimentation.
Seeking Recommendations for Datasets: A new member to machine learning inquired about datasets suitable for a research paper focusing on text/numeric regression or classification tasks, leading to various dataset recommendations like MNIST-1D and sentiment analysis datasets such as Stanford's Large Movie Review Dataset.
Discussion on Text-to-Video Diffusion Models: A lively discussion unfolded concerning the superiority of diffusion models over generative transformers for SOTA text-to-video tasks, noting that diffusion models are often fine-tuned from existing T2I models, saving on computational costs. Insights into the 3D knowledge of these models and their text-conditioning challenges were shared.
Video Diffusion Model Insights from an Author: The author of the stable video diffusion paper participated in the discussion, highlighting the challenges of obtaining quality text supervision for video models, the potential of auto-captioning videos with LLMs, and the nuances between autoregressive and diffusion approaches to video generation.

Links mentioned:

VideoPoet – Google Research: A Large Language Model for Zero-Shot Video Generation. VideoPoet demonstrates simple modeling method that can convert any autoregressive language model into a high quality video generator.
GitHub - RaccoonResearch/Simian: Synthetic data generator for image, video and 3D models: Synthetic data generator for image, video and 3D models - RaccoonResearch/Simian
GitHub - instructlab/community: InstructLab Community wide collaboration space including contributing, security, code of conduct, etc: InstructLab Community wide collaboration space including contributing, security, code of conduct, etc - instructlab/community

LlamaIndex ▷ #announcements (1 messages): 

OpenDevin Webinar Announcement: LlamaIndex is hosting a webinar featuring the authors of OpenDevin, an open-source version of Devin by Cognition, on Thursday at 9am PT. Attendees can learn about building autonomous AI agents and will gain insights from its burgeoning popularity on GitHub. Register here.

Link mentioned: LlamaIndex Webinar: Build Open-Source Coding Assistant with OpenDevin · Zoom · Luma: OpenDevin is a fully open-source version of Devin from Cognition - an autonomous AI engineer able to autonomously execute complex engineering tasks and…

LlamaIndex ▷ #blog (4 messages): 

Hugging Face TGI Unveils New Features: Hugging Face's TGI (Text Generation Inference) toolkit announced support for function calling and batched inference among other features, aimed at optimizing LLM deployments on the platform. The full list of features was shared in a tweet by LlamaIndex.

Jerry Liu to Speak at AIQCon: Co-founder Jerry Liu will be speaking about building Advanced Question-Answering Agents Over Complex Data at AIQCon in San Francisco. A 15% discount on tickets is available with the code "Community," as per a tweet with the conference details.

Enhancing RAG with LlamaParse: LlamaParse aims to improve the data quality for building Retrieval-Augmented Generation (RAG) models over complex documents, emphasizing that quality data is critical for good performance. This development was highlighted in a recent LlamaIndex tweet.

OpenDevin as an Open-Source AI Engineer: @cognition_labs released OpenDevin, an open-source autonomous AI engineer capable of executing complex engineering tasks and collaborating on software projects. The announcement and details were provided in a LlamaIndex update.

Link mentioned: The AI Quality Conference: The world's first AI Quality Conference on June 25, 2024 in San Francisco, CA

LlamaIndex ▷ #general (50 messages🔥): 

The Quest for LlamaIndex Integration: The conversation grappled with issues integrating LlamaIndex with other databases. A user faced challenges with empty responses when querying Supabase Vectorstore and mismatched dimension errors; they eventually resolved the dimension issue by specifying the model and dimensions during query engine creation.

Package Imports and Documentation Confusion: Discussion unfolded around package imports after a recent update to llama-index. Users shared experiences about locating the correct package paths and imports, and guidance was provided to refer to the updated llama-hub documentation for installs and imports.

Troubleshooting the Deletion of Document Knowledge: A user encountered problems when attempting to delete document knowledge from their index in llama-index. Conversation with another user suggested re-instantiating the query_engine after persisting changes in order to observe the deletion, although the issue appeared unresolved as the user continued to see the document in the json vector store.

PDF Parsing Libraries for Local Use: A user inquired about local PDF parsing libraries as an alternative to using LlamaParse. PyMuPDF was recommended, with a usage example given showing how to integrate PyMuPDF with LlamaIndex.

Handling Absence of Relevant Information in Responses: Users discussed approaches to prevent a model from responding when no relevant information is found in the context. Prompt engineering and employment of a check similar to Evaluate on each request were suggested as potential solutions.

Links mentioned:

OpenAI & other LLM API Pricing Calculator - DocsBot AI: Calculate and compare the cost of using OpenAI, Azure, Anthropic, Llama 3, Google Gemini, Mistral, and Cohere APIs with our powerful FREE pricing calculator.
Llama Hub: no description found
llama_index/llama-index-integrations/embeddings/llama-index-embeddings-huggingface at main · run-llama/llama_index: LlamaIndex is a data framework for your LLM applications - run-llama/llama_index
Building Evaluation from Scratch - LlamaIndex: no description found

LlamaIndex ▷ #ai-discussion (4 messages): 

Seeking HyDE for Complex NL-SQL: A member is building a NL-SQL chat bot to handle complex SQL queries across hundreds of tables and is looking for a HyDE method that's effective in this scenario. They've only found references to HyDE's use in pdf/text chatbots and are exploring options for enhancing the LLM's accuracy in database querying.

Introspective Agents with LlamaIndex: A link to an article titled "Introspective Agents with LlamaIndex" was shared, describing introspective agents that use the reflection agent pattern within the LlamaIndex framework. The source is AI Artistry on Medium and includes images created using MidJourney.

Medium 404 Error for Agent Article: A member shared a link to the same article, "Introspective Agents with LlamaIndex," but this time the link led to a 404 Page Not Found error on Medium. The page suggested navigating to other stories that could potentially be of interest.

Article Acknowledgement: Another member acknowledged the shared article on Introspective Agents with LlamaIndex as a "nice article," implying a positive reception but providing no further detail or discussion.

Links mentioned:

Introspective Agents with LlamaIndex: Ankush k Singal
no title found: no description found

tinygrad (George Hotz) ▷ #general (35 messages🔥): 

Rethinking UOps Representation: A user suggests improving the readability of tinygrad's operation presentation by adopting a format similar to LLVM IR. The suggestion includes using a more human-readable format for operations.
Clarifying SSA and UOps: In discussing the change, it's highlighted that tinygrad's operations are meant to be in Static Single Assignment (SSA) form. A user points to the confusion around the placement of the PHI operation at the end of a block rather than at the beginning, as seen in traditional LLVM IR.
Opinions Divided on Proposed Formatting: While one member argues against changing tinygrad's current formatting, stating it introduces an unnecessary abstraction layer, another member encourages submitting a Pull Request (PR) to implement the proposed changes.
Discord Betting Escapade: Two users engage in a conversation about betting on the correctness of code through their PayPal balance. It evolves into an acknowledgement of potential bugs and the intricacies of creating a betting challenge bound by time and accuracy on the server.
Machine Learning Jargon Workaround Discussed: Amidst the technical exchange, a user seeks advice on approaching machine learning without delving deep into mathematical terminology. They are directed towards resources by Andrej Karpathy, while another user reiterates the chat rules around asking beginner questions.

tinygrad (George Hotz) ▷ #learn-tinygrad (20 messages🔥): 

CPU Kernels Single Threaded in tinygrad: In response to a question about whether tinygrad utilizes multiple threads for operations like matrix multiplication, George Hotz confirmed that tinygrad is single threaded and does not use threads for CPU operations.

Understanding Remapping and Strides in Tensors: A user described how remapping a tensor by changing its stride can allow certain reshapes and computations, and suggested maintaining an original shape to calculate indices after reshaping, a technique which might resemble what tinygrad does under the hood.

Sharing Knowledge Through Documentation: Users have shared self-created explanatory content such as posts on symbolic mean and Winograd convolutions understanding. One user shared a GitHub post on symbolic mean and another offered a Google Doc link for view merges.

Quantized Inference Capabilities in tinygrad: A user inquired about tinygrad's ability to perform quantized inference similar to bitsandbytes library, with an acknowledgment that it is somewhat capable of doing so.

Recommendations for Learning Through Documentation and Examples: Users encouraged the creation and sharing of toy examples and documentation as a method for learning and teaching concepts related to tinygrad, indicating the potential benefit for both the original author and the community.

Links mentioned:

tinygrad-notes/symbolic-mean.md at main · mesozoic-egg/tinygrad-notes: Tutorials on tinygrad. Contribute to mesozoic-egg/tinygrad-notes development by creating an account on GitHub.
tinygrad/docs-legacy/reshape_without_symbolic.md at master · tinygrad/tinygrad: You like pytorch? You like micrograd? You love tinygrad! ❤️  - tinygrad/tinygrad
View Merges: no description found

Cohere ▷ #general (35 messages🔥): 

Seeking SQL Database Location in Cohere Toolkit: A member inquired about the location of the SQL database for storing conversational history in the Cohere toolkit. Another member clarified that it's on port 5432, without specifying an exact location.

Student Aspires to Create a Google Bard-like Chatbot: A high school student expressed their ambition to build a chatbot similar to Google's Bard, questioning whether it complies with Cohere's user agreement. Cohere's representative shared guidance on trial and production keys, confirming that building and eventually providing paid access to a chatbot is permissible, subject to obtaining a production key.

Addressing Chroma Retrieval and Embedding Issues: A member reported issues while testing the Cohere toolkit locally with Chroma, specifically an IndexError happening during document retrieval. The conversation pointed towards checking the full log trace available at a Pastebin link and the prebuilt container from ghcr.io/cohere-ai/cohere-toolkit:latest.

Langchain Retriever Always Selected in Cohere Toolkit: Despite selecting File Reader -LlamaIndex, a user reported that the Langchain retriever is used instead, as evidenced by a shared screenshot, which, however, was not accessible.

Registration of Production Key Acting as Trial Key: A user experienced an issue where a newly registered production key still functioned as a trial key. Cohere's representative clarified that the trial key is only used in the Playground / Chat UI and assured that when used in the API, it should reflect production key usage and does not require pre-funding.

Links mentioned:

📏 Telemetry | Chroma: Chroma contains a telemetry feature that collects anonymous usage information.
Pasteboard - Uploaded Image: no description found
Going Live - Cohere Docs: no description found
Python Decorators In 1 MINUTE!: Discover the power of Python decorators in just 1 minute! This quick tutorial introduces you to the basics of decorators, allowing you to enhance your Python...
PDF | 🦜️🔗 LangChain: Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of applicatio...

Cohere ▷ #project-sharing (2 messages): 

Cohere Coral Combines Chatbot and ReRank: A member introduced an app called Coral Chatbot which integrates text generation, summarization, and ReRank into one tool. You can check out the app and provide feedback through their Streamlit page.

Python Decorators Demystified in 60 Seconds: A quick tutorial entitled "Python Decorators In 1 MINUTE" was shared, promising a brief introduction to Python decorators. Interested members can watch the explainer video on YouTube.

Links mentioned:

Python Decorators In 1 MINUTE!: Discover the power of Python decorators in just 1 minute! This quick tutorial introduces you to the basics of decorators, allowing you to enhance your Python...
no title found: no description found

Latent Space ▷ #ai-general-chat (35 messages🔥): 

Centaur Coders to Shrink Team Sizes?: A linked post from v01.io discusses the potential for Centaur Programmers to reduce product team sizes, leveraging human-AI collaboration to increase efficiency. Discussions around this hypothesis speculate whether smaller teams will emerge, or if teams will instead focus on improving product output.

DeepSeek-V2 Tops Performance Charts: Announced via Twitter, DeepSeek-V2 is a notable open-source MoE model excelling in benchmarks, with top-tier performance in several areas, including code and reasoning capabilities. Community response included excitement and analysis, with linked discussions examining the implications of the new model.

Unveiling DeepSeek's Achievements: Additional conversation centered on DeepSeek-V2's benchmark achievements was shared, along with a personal take provided in an AI News newsletter, illustrating the model's impact on the AI landscape.

Exploration of Unified Search Options: The search for a feasible unified search solution for small organizations led to the mention of Glean and a potential OSS alternative discussed in a shared Hacker News post. A suggestion was made for a bot that can preemptively search for relevant posts across platforms like Discord.

Inquiry into AI Orchestration Practices: Queries about AI (data) orchestration practices were raised, seeking community input on preferred orchestration tools, data transfer methods, and architectural advice for handling complex data pipelines involving text and embeddings.

Links mentioned:

Centaur Programmers shrink product team sizes — at least, they should. – Klaus Breyer: Pedro Alves, Bruno Pereira Cipriano, 2023: The idea of the Centaur Programmer is based on the premise that a collaborative approach between humans and AI will be more effective than AI alone, as demon...
Tweet from DeepSeek (@deepseek_ai): 🚀 Launching DeepSeek-V2: The Cutting-Edge Open-Source MoE Model!  🌟 Highlights: > Places top 3 in AlignBench, surpassing GPT-4 and close to GPT-4-Turbo. > Ranks top-tier in MT-Bench, rivaling ...
no title found: no description found
[AINews] DeepSeek-V2 beats Mixtral 8x22B with >160 experts at HALF the cost: AI News for 5/3/2024-5/6/2024. We checked 7 subreddits and 373 Twitters and 28 Discords (419 channels, and 10335 messages) for you. Estimated reading time...

AI Stack Devs (Yoko Li) ▷ #ai-companion (6 messages): 

Local Faraday Use is Free: Members clarified that using Faraday locally doesn't require a cloud subscription and there are no costs involved. One member shared their personal experience, stating it works fine with 6 GB VRAM and includes free voice output.

Forever Access to Downloads: Users highlighted that once characters and models are downloaded from the Faraday platform, they can be used indefinitely without further charges.

A Nod to Sufficient GPUs: It was pointed out that a powerful enough GPU eliminates the need for a cloud account, unless users wish to contribute to the developers through a subscription.

AI Stack Devs (Yoko Li) ▷ #team-up (5 messages): 

Simulation Collaboration Kickoff: An individual, @abhavkedia, proposed working together to create a fun simulation in relation to the Kendrick and Drake situation, seeking collaborators.
Project Progress & Team-Up: @abhavkedia shared their current progress on the simulation, and @jakekies expressed interest in joining the project, indicating a collaborative effort is underway.

AI Stack Devs (Yoko Li) ▷ #ai-town-dev (11 messages🔥): 

Code Updates Need Manual Table Wiping: Updating character definitions in code won't automatically refresh the systems; members were reminded to run wipe-all-tables if they make character function or map changes.
Automation Suggestion Using inotifywait: A method was shared to automate updates by using inotifywait for detecting changes in data/character.ts and making changes accordingly.
Invitation to Try Llama Farm: Members were encouraged to try a new simulation, Llama Farm, with interest already being shown to the shared link.
Llama Farm Integration with AI-Town: A unique integration concept between Llama Farm and AI-Town was proposed, requiring an instance of Ollama to participate in AI-Town, with local message processing tagged by character.
Toward a More Generic Llama Farm: Plans to generalize Llama Farm's ability to hook into any system using the OpenAI API were shared, including an approach for streaming using query/mutation for multiplexing.

Link mentioned: llama farm: no description found

AI Stack Devs (Yoko Li) ▷ #paper-spam (1 messages): 
Deforum Daily Papers: Papers will now be sent to <#1227492197541220394>

Mozilla AI ▷ #llamafile (18 messages🔥): 

Slower Model Performance on Devices: A member reported that running the model on their device was slow, yielding 8 seconds per token.

Rocket-3B as a Speedier Alternative: After facing performance issues with a different model, a member was advised to try Rocket-3B, which significantly improved the speed.

Efficient Use of Ollama Cache with Llamafile: There was an inquiry about whether llamafile can utilize models stored in the ollama cache to prevent multiple downloads, and the response clarified that it's possible by using -m model_name.gguf.

Challenges with AutoGPT and Llamafile Integration: A member encountered a problem with AutoGPT not starting the AP server correctly, causing llamafile agent to be killed on startup; manual restart was a workaround for port 8080 but failed for port 8000.

Draft PR for Llamafile Support in AutoGPT: Instructions for setting up autoGPT + llamafile were provided, indicating that feedback from AutoGPT maintainers is pending before further development. The conversation implied there's an ongoing effort to integrate llamafile with AutoGPT via a draft PR. Draft llamafile support instructions.

Links mentioned:

AutoGPT/autogpts/autogpt/llamafile-integration at draft-llamafile-support · Mozilla-Ocho/AutoGPT: AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters. - Mozilla-Ocho/AutoGPT
Draft llamafile support by k8si · Pull Request #7091 · Significant-Gravitas/AutoGPT: Background  This draft PR is a step toward enabling the use of local models in AutoGPT by adding llamafile as an LLM provider. Related issues:  #6336 #6947  Changes 🏗️  For full documentation of th.....

Interconnects (Nathan Lambert) ▷ #news (5 messages): 

Debate Over AI Evaluations: A link was shared from Dr. Jim Fan's tweet regarding AI evaluations. The message highlighted that while the tweet's content is interesting, there may be errors, and that specific benchmarks and public democracy in evaluation are overvalued, favoring AB testing over open democracy.

Benchmarking—A Database Perspective: A member resonated with the need for standard benchmarks in AI and compared it with their experience in the database field. They suggested that the three sets of benchmarking as mentioned in the tweet could be an appropriate approach.

What's TPC? A Quick Intro: Following a question about TPC, a member explained that TPC stands for Transaction Processing Council, a neutral entity that sets and audits standards for the database industry via benchmarks like TPC-C and TPC-H. They detailed it as a response to overhyped claims by database vendors.

Interconnects (Nathan Lambert) ▷ #random (11 messages🔥): 

GPT-2 Chatbots Stir Up LMsys Arena: A tweet shared by xeophon. mentioned a "good gpt2-chatbot" referencing a humorous comment by Sam Altman, indicating GPT-2's return to the LMsys arena. The tweet included a conversation snapshot.
Skepticism Over OpenAI's Use of LMsys for Evaluations: Nathan Lambert expressed a personal disapproval of OpenAI seemingly using LMsys for their model evaluations.
LMsys Tightrope Walk: Nathan Lambert observed that LMsys is in a difficult position due to a lack of resources, which prevents them from refusing collaborations.
Concerns About LMsys Credibility: Nathan mentioned that the recent wave of 'chatgpt2-chatbot' engagement could negatively impact LMsys's reputation and credibility.
Podcast Interview on LMsys Considered: Nathan considered doing an Interconnects audio-interview with the LMsys team but remains undecided, citing past interactions that lacked synergy.

Link mentioned: Tweet from ハードはんぺん (@U8JDq51Thjo1IHM): I’m-also-a-good-gpt2-chatbot  I’m-a-good-gpt2-chatbot ？？  Quoting Jimmy Apples 🍎/acc (@apples_jimmy)   @sama funny guy arnt you.   Gpt2 back on lmsys arena.

DiscoResearch ▷ #mixtral_implementation (2 messages): 

PR Closure Confirmed: The discussion concluded with the information that a Pull Request (PR) was closed/rejected. No further details were provided.

DiscoResearch ▷ #general (3 messages): 

AIDEV Conference Gathering Excitement: Members are expressing excitement about the upcoming AIDEV event and are coordinating to meet up. Those attending are encouraged to connect if they haven’t already.
Inquiries on Conference Amenities: A question was raised regarding whether food will be available at the AIDEV event, or if attendees should bring their own.

DiscoResearch ▷ #discolm_german (10 messages🔥): 

Exploring Mistral's Capabilities: A member confirmed using the 8x22b Mistral model for their current project, discussing its deployment and performance aspects.

Decoding Techniques to Lower Patency: A query was raised about achieving low latency in decoding without waiting for the end of a sentence, touching on strategies for efficient language model output generation.

Potential for a German DPO Dataset: The idea of creating a German dataset for inclusive language was proposed, sparking a discussion on its utility and whether it should also focus on grammar and wording alongside inclusiveness.

Seeking Input for German Pretraining Dataset: A member sought feedback on building a German-exclusive pretraining dataset from Common Crawl and inquired about specific domains that might warrant greater emphasis due to high-quality content.

Resource Sharing for Inclusive Language: Resources were shared for gender and diversity-sensitive language, including the INCLUSIFY prototype (https://davids.garden/gender/) and a related GitLab repository (https://gitlab.com/davidpomerenke/gender-inclusive-german), which might be relevant to implementing inclusive language modes in AI models.

Links mentioned:

David’s Garden - Gender-inclusive German: A benchmark and a model: Gender-inclusive language is important for achieving gender equality in languages with gender inflections, such as, for the purpose of this report, German.
David Pomerenke / Gender-inclusive German - a benchmark and a pipeline · GitLab: GitLab.com

LLM Perf Enthusiasts AI ▷ #prompting (7 messages): 

Prompt Generator Tool Revealed: A member discussed the discovery of a new prompt generator tool in the Anthropic console.
Polite Rephrasing Function Tested: The tool was tested with a prompt to rephrase a sentence more politely, yielding a satisfactory result. 
Exploring the Underlying Mechanics: A member is in the process of extracting the system prompt used by the new tool, commenting on the extensive use of k-shot examples, with an intriguing one about a Socratic math tutor.
Challenge in Extracting Long Prompts: Attempts to extract the full prompt are ongoing, with difficulties arising due to its length, particularly a lengthy math tutor example.
Promise to Share Information: The member confirmed that once the full prompt is successfully extracted, it will be shared in the chat.

Alignment Lab AI ▷ #general-chat (2 messages): 
Since the provided messages are only greetings, there is no substantive content to summarize in the requested format. If more topical and detailed messages are provided, I would be able to create a summary based on those.

Datasette - LLM (@SimonW) ▷ #llm (2 messages): 

GitHub Issue Collaboration: Discussing parameterized testing for a plugin, a link to a GitHub issue (Design and implement parameterization mechanism · Issue #4) was shared, indicating ongoing development and contributions.
Inquiry About llm with OpenAI Assistant API: A member asked whether the llm can be used with the OpenAI Assistant API, hoping they hadn't missed that information somewhere.

Link mentioned: Design and implement parameterization mechanism · Issue #4 · simonw/llm-evals-plugin: Initial thoughts here: #1 (comment) I want a parameterization mechanism, so you can run the same eval against multiple examples at once. Those examples can be stored directly in the YAML or can be ...

Don't miss what's next. Subscribe to AI News (MOVED TO news.smol.ai!):