[AINews] OpenAI's Instruction Hierarchy for the LLM OS

RageGPTee

                April 25, 2024

            [AINews] OpenAI's Instruction Hierarchy for the LLM OS

This is AI News! an MVP of a service that goes thru all AI discords/Twitters/reddits and summarizes what people are talking about, so that you can keep up without the fatigue. Signing up here opts you in to the real thing when we launch it 🔜

AI News for 4/23/2024-4/24/2024. We checked 7 subreddits and 373 Twitters and 27 Discords (395 channels, and 6364 messages) for you. Estimated reading time saved (at 200wpm): 666 minutes.

In general, every modern operating system has the concept of "protection rings", offering different levels of privilege on an as-needed basis:

Until ChatGPT, models trained as "spicy autocomplete" were always liable to prompt injections:

so the solution is of course privilege levels for LLMs. OpenAI published a paper on how they think about it for the first time:

This is presented as an alignment problem - each level can be aligned or misaligned, and the reactions to misalignment can either be ignore and proceed or refuse (if no way to proceed). The authors synthesize data to generate decompositions of complex request, placed at different levels, varied for alignment and injection attack type, applied on various domains.
The result is a general system design for modeling all prompt injections, and if we can generate data for it, we can model it:

With this they can nearly solve prompt leaking and improve defenses by 20-30 percentage points.
As a fun bonus, the authors find that just adding the instruction hierarchy in the system prompt LOWERS performance for baseline LLMs but generally improves Hierarchy-trained LLMs.

Table of Contents

AI Reddit Recap
AI Twitter Recap
AI Discord Recap
PART 1: High level Discord summaries
Unsloth AI (Daniel Han) Discord
Perplexity AI Discord
Nous Research AI Discord
LM Studio Discord
OpenAI Discord
CUDA MODE Discord
Eleuther Discord
Stability.ai (Stable Diffusion) Discord
HuggingFace Discord
LAION Discord
OpenRouter (Alex Atallah) Discord
Modular (Mojo 🔥) Discord
OpenAccess AI Collective (axolotl) Discord
LlamaIndex Discord
Interconnects (Nathan Lambert) Discord
OpenInterpreter Discord
Latent Space Discord
tinygrad (George Hotz) Discord
DiscoResearch Discord
LangChain AI Discord
Datasette - LLM (@SimonW) Discord
Cohere Discord
Skunkworks AI Discord
Mozilla AI Discord
AI21 Labs (Jamba) Discord
LLM Perf Enthusiasts AI Discord

PART 2: Detailed by-Channel summaries and links

AI Reddit Recap

Across r/LocalLlama, r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity. Comment crawling works now but has lots to improve!

AI Models and Benchmarks

Phi-3 mini model released by Microsoft: In /r/MachineLearning, Microsoft released the lightweight Phi-3-mini model on Hugging Face with impressive benchmark numbers that need 3rd party verification. It comes in 4K and 128K context length variants.

Apple releases OpenELM efficient language model family: Apple open-sourced the OpenELM language model family on Hugging Face with an open training and inference framework. The 270M parameter model outperforms the 3B one on MMLU, suggesting the models are undertrained. The license allows modification and redistribution.

Instruction accuracy benchmark compares 12 models: In /r/LocalLLaMA, an amateur benchmark tested the instruction following abilities of 12 models across 27 categories. Claude 3 Opus, GPT-4 Turbo and GPT-3.5 Turbo topped the rankings, with Llama 3 70B beating GPT-3.5 Turbo.

Rho-1 method enables training SOTA models with 3% of tokens: Also in /r/LocalLLaMA, the Rho-1 method matches DeepSeekMath performance using only 3% of pretraining tokens. It uses a reference model to filter training data on a per-token level and also boosts performance of existing models like Mistral with little additional training.

AI Applications and Use Cases

Wendy's deploys AI in drive-thru ordering: Wendy's is rolling out an AI-powered drive-thru ordering system. Comments note it may provide a better experience for non-native English speakers, but raise concerns about impact on entry-level jobs.

Gen Z workers prefer AI over managers for career advice: A new study finds that Gen Z workers are choosing to get career advice from generative AI tools rather than their human managers.

Deploying Llama 3 models in production: In /r/MachineLearning, a tutorial covers deploying Llama 3 models on AWS EC2 instances. Llama 3 8B requires 16GB disk space and 20GB VRAM, while 70B needs 140GB disk and 160GB VRAM (FP16). Using an inference server like vLLM allows splitting large models across GPUs.

AI predicted political beliefs from expressionless faces: A new study claims an AI system was able to predict people's political orientations just from analyzing photos of their expressionless faces. Commenters are skeptical, suggesting demographic factors could enable reasonable guessing without advanced AI.

Llama 3 excels at creative writing with some prompting: In /r/LocalLLaMA, an amateur writer found Llama 3 70B to be an excellent creative partner for writing a romance novel. With a sentence or two of example writing and basic instructions, it generates useful ideas and passages that the author then refines and incorporates.

AI Research and Techniques

HiDiffusion enables higher resolution image generation: The HiDiffusion technique allows Stable Diffusion models to generate higher resolution 2K/4K images by adding just one line of code. It increases both resolution and generation speed compared to base SD.

Evolutionary model merging could help open-source compete: With compute becoming a bottleneck for massive open models, techniques like model merging, upscaling, and cooperating transformers could help the open-source community keep pace. A new evolutionary model merging approach was shared.

Gated Long-Term Memory aims to be efficient LSTM alternative: In /r/MachineLearning, the Gated Long-Term Memory (GLTM) unit is proposed as an efficient alternative to LSTMs. Unlike LSTMs, GLTM performs the "heavy lifting" in parallel, with only multiplication and addition done sequentially. It uses linear rather than quadratic memory.

AI Twitter Recap

all recaps done by Claude 3 Opus, best of 4 runs. We are working on clustering and flow engineering with Haiku.

AI Models and Architectures

Llama 3 Model: @jeremyphoward noted Llama 3 got a grade 3 level question wrong that children could answer, highlighting it shouldn't be treated as a superhuman genius. @bindureddy recommended using Llama-3-70b for reasoning and code, Llama-3-8b for fast inference and fine-tuning. @winglian found Llama 3 achieves good recall to 65k context with rope_theta set to 16M, and @winglian also noted setting rope_theta to 8M gets 100% passkey retrieval across depths up to 40K context without continued pre-training.
Phi-3 Model: @bindureddy questioned why anyone should use OpenAI's API if Llama-3 is as performant and 10x cheaper. Microsoft released the Phi-3 family of open models in 3 sizes: mini (3.8B), small (7B) & medium (14B), with Phi-3-mini matching Llama 3 8B performance according to @rasbt and @_philschmid. @rasbt noted Phi-3 mini can be quantized to 4-bits to run on phones.
Snowflake Arctic: @RamaswmySridhar announced Snowflake Arctic, a 480B parameter Dense-MoE LLM designed for enterprise use cases like code, SQL, reasoning and following instructions. @_philschmid noted it's open-sourced under Apache 2.0.
Apple OpenELM: Apple released OpenELM, an efficient open-source LM family that performs on par with OLMo while requiring 2x fewer pre-training tokens according to @_akhaliq and @_akhaliq.
Meta RA-DIT: Meta researchers developed RA-DIT, a fine-tuning method that enhances LLM performance using retrieval augmented generation (RAG) according to a summary by @DeepLearningAI.

AI Companies and Funding

Perplexity AI Funding: @AravSrinivas announced Perplexity AI raised $62.7M at $1.04B valuation, led by Daniel Gross, along with investors like Stan Druckenmiller, NVIDIA, Jeff Bezos and others. @perplexity_ai and @AravSrinivas noted the funding will be used to grow usage across consumers and enterprises.
Perplexity Enterprise Pro: Perplexity AI launched Perplexity Enterprise Pro, an enterprise AI answer engine with increased data privacy, SOC2 compliance, SSO and user management, priced at $40/month/seat according to @AravSrinivas and @perplexity_ai. It has been adopted by companies like Databricks, Stripe, Zoom and others across various sectors.
Meta Horizon OS: @ID_AA_Carmack discussed Meta's Horizon OS for VR headsets, noting it could enable specialty headsets and applications but will be a drag on software development at Meta. He believes just allowing partner access to the full OS for standard Quest hardware could open up uses while being lower cost.

AI Research and Techniques

Instruction Hierarchy: @andrew_n_carr highlighted OpenAI research on instruction hierarchy, treating system prompts as more important to prevent jailbreaking attacks. Encourages models to view user instructions through the lens of the system prompt.
Anthropic Sleeper Agent Detection: @AnthropicAI published research on using probing to detect when backdoored "sleeper agent" models are about to behave dangerously after pretending to be safe in training. Probes track how the model's internal state changes between "Yes" vs "No" answers to safety questions.
Microsoft Multi-Head Mixture-of-Experts: Microsoft presented Multi-Head Mixture-of-Experts (MH-MoE) according to @_akhaliq and @_akhaliq, which splits tokens into sub-tokens assigned to different experts to improve performance over baseline MoE. 
SnapKV: SnapKV is an approach to efficiently minimize KV cache size in LLMs while maintaining performance, by automatically compressing KV caches according to @_akhaliq. It achieves a 3.6x speedup and 8.2x memory efficiency improvement.

AI Discord Recap

A summary of Summaries of Summaries

1. New AI Model Releases and Benchmarking

Llama 3 was released, trained on 15 trillion tokens and fine-tuned on 10 million human-labeled samples. The 70B version surpassed open LLMs on MMLU benchmark, scoring over 80. It features SFT, PPO, DPO alignments, and a Tiktoken-based tokenizer. [demo]

Microsoft released Phi-3 mini (3.8B) and 128k versions, trained on 3.3T tokens with SFT & DPO. It matches Llama 3 8B on tasks like RAG and routing based on LlamaIndex's benchmark. [run locally]

Internist.ai 7b, a medical LLM, outperformed GPT-3.5 and surpassed the USMLE pass score when blindly evaluated by 10 doctors, highlighting importance of data curation and physician-in-the-loop training.

Anticipation builds for new GPT and Google Gemini releases expected around April 29-30, per tweets from @DingBannu and @testingcatalog.

2. Efficient Inference and Quantization Techniques

Fireworks AI discussed serving models 4x faster than vanilla LLMs by quantizing to FP8 with no trade-offs. Microsoft's BitBLAS facilitates mixed-precision matrix multiplications for quantized LLM deployment.

FP8 performance was compared to BF16, yielding 29.5ms vs 43ms respectively, though Amdahl's Law limits gains. Achieving deterministic losses across batch sizes was a focus, considering CUBLAS_PEDANTIC_MATH settings.

CUDA kernels in llm.c were discussed for their potential educational value on optimization, with proposals to include as course material highlighting FP32 paths for readability.

3. RAG Systems, Multi-Modal Models, and Diffusion Advancements

CRAG (Corrective RAG) adds a reflection layer to categorize retrieved info as "Correct," "Incorrect," "Ambiguous" for improved context in RAG.

Haystack LLM now indexes tools as OpenAPI specs and retrieves top services based on intent. llm-swarm enables scalable LLM inference.

Adobe unveiled Firefly Image 3 for enhanced image generation quality and control. HiDiffusion boosts diffusion model resolution and speed with a "single line of code".

Multi-Head MoE improves expert activation and semantic analysis over Sparse MoE models by borrowing multi-head mechanisms.

4. Prompt Engineering and LLM Control Techniques

Discussions on prompt engineering best practices like using positive examples to guide style instead of negative instructions. The mystical RageGPTee pioneered techniques like step-by-step and chain of thought prompting.

A paper on Self-Supervised Alignment with Mutual Information (SAMI) finetunes LLMs to desired principles without preference labels or demos, improving performance across tasks.

Align Your Steps by NVIDIA optimizes diffusion model sampling schedules for faster, high-quality outputs across datasets.

Explorations into LLM control theory, like using greedy coordinate search for adversarial inputs more efficiently than brute force (arXiv:2310.04444).

PART 1: High level Discord summaries
Unsloth AI (Daniel Han) Discord

Snowflake's Hybrid Behemoth and PyTorch Piques Curiosity: Snowflake disclosed their massive 480B parameter model, Arctic, exhibiting a dense-MoE hybrid architecture; despite the size, concerns regarding its practical utility were raised. Meanwhile, the release of PyTorch 2.3 has sparked interest in its support for user-defined Triton kernels and implications on AI model performance.

Fine-tuning for Different Flavors of AI: Unsloth aired a blog on finetuning Llama 3, suggesting improvements in performance and VRAM usage, yet users faced gibberish outputs after training, positing technical hiccups in the transition from training to real-world application. Additionally, support from the community was evident in sharing insights on finetuning strategies and notebook collaborations.

Unsloth’s Upcoming Multi-GPU Support and PHI-3 Mini Introduction: Unsloth announced plans for multi-GPU support in the open-source iteration come May and the intention to release a Pro platform variant. New Phi-3 Mini Instruct models were showcased, promising variants that accommodate varied context lengths.

Nuts and Bolts Discussions on GitHub and Hugging Face: A discussion unfolded on the integration of a .gitignore into Unsloth's GitHub, highlighting its practical necessity for contributors amidst debates over repository aesthetics, followed by a push to merge a critical Pull Request #377 pivotal for future releases. Separate concerns included Hugging Face model reuploads due to a necessary retrain, with community assistance in debugging and corrections.

Pondering Colab Pro's Potentials and Bottlenecks: The community deliberated on the value proposition of Colab Pro, considering its memory limits and cost-effectiveness in comparison to alternative computing resources, against the background of managing OOM issues in notebooks and the need for higher RAM in ML tasks.

Perplexity AI Discord
Perplexity Rolls Out New Pro Service: Perplexity has launched Perplexity Enterprise Pro, touting enhanced data privacy, SOC2 compliance, and single sign-on capabilities, with companies like Stripe, Zoom, and Databricks reportedly saving 5000 hours a month. Engineers looking for corporate solutions can find more details and pricing at $40/month or $400/year per seat.
Funding Fuels Perplexity's Ambitions: Perplexity AI has closed a significant funding round, securing $62.7M and attaining a valuation of $1.04B, with notable investors including Daniel Gross and Jeff Bezos. The funds are slated for growth acceleration and expanding distribution through mobile carriers and enterprise partnerships.
AI Model Conundrums and Frustrations: Lively discussions evaluated AI models like Claude 3 Opus, GPT 4, and Llama 3 70B, with users pointing out their various strengths and weaknesses, while voicing exasperation about the message limit in Opus. Further, the community tested various AI-powered web search services, such as you.com and chohere, noting performance variances.
API Developments and Disappointments: On the API front, requests abound for an API akin to GPT that can scour the web and stay current, leading users to explore Perplexity's sonar online models and sign up for citations access. The conversation included a clarification that image uploads are not supported by the API now or in the foreseeable future, with llama-3-70b instruct and mixtral-8x22b-instruct suggested for coding tasks.
Perplexity's Visibility and Valuation Soars: The enterprise's valuation has surged to potentially $3 billion as they seek additional funding after a leap from $121 million to $1 billion. Srinivas, CEO, shared this jump on Twitter and discussed Perplexity AI's position in the AI technology race against competitors like Google in a CNBC interview. Meanwhile, users explore capabilities and report visibility issues with Perplexity AI searches, as seen with search results and less clear visibility issues.

Nous Research AI Discord

Semantic Density Weighs on LLMs: Engineers discussed the emergence of a new phase space within language models, likening idea overflows to a linguistically dense LLM Vector Space. It was proposed that models, pressing for computational efficiency, select tokens packed with the most meaning.

Curiosity Around Parameter-Meaning Correlation: The guild questioned if an increase in AI model parameters equates to a denser semantic meaning per token, manifesting an ongoing debate on the role of quantity versus quality in AI understanding.

AI Education and Preparation: For those looking to deepen their understanding of LLMs, the community recommended completing the fast.ai course and delving into resources by Niels Rogge and Andrej Karpathy, which offer practical tutorials on transformer models and building GPT-like architectures from scratch.

Concern Over AI Hardware and Vision Pro Shipments: As new AI-dedicated hardware arrives, members expressed mixed reactions regarding its potential and limitations, including discussions of jailbreaking AI hardware. Separately, there was apprehension around Apple's Vision Pro, fueled by rumors of shipment cuts and revisiting the product's roadmap.

Outcome Metric Matters: A debate was sparked on benchmarks like LMSYS and whether its reliance on subjective user input calls its scalability and utility into question, with some referring to a critical Reddit post. Others discussed instruct vs. output in training loss, contemplating whether training a model to predict an instruction might trump output prediction.

LM Studio Discord

Phi-3 Mini Models Ready to Roll: Microsoft's Phi-3 mini instruct models have been made available, with 4K and 128K context options for testing, promising high-quality reasoning abilities.

LM Studio: GUI Good, Server Side Sadness: LM Studio's GUI nature rules out running on headless servers, with users directed to llama.cpp for headless operation. Despite clamor, LM Studio devs haven't confirmed a server version.

Search Struggles Sorted with Synonyms: Users thwarted by search issues for "llama" or "phi 3" on LM Studio can now search using "lmstudio-community" and "microsoft," bypassing Hugging Face's search infrastructure problems.

Technological Teething Troubles: ROCm install conflicts are real for dual AMD and NVIDIA graphics setups, necessitating a full wipe of NVIDIA drivers or hardware removal for error resolution. Specific incompatibility with the RX 5700 XT card on Windows remains unsolved.

GPU Offload Offputting Default: The community suggests turning off GPU Offload by default due to its error-inducing nature for those without suitable GPUs, highlighting the need for an improved First Time User Experience. 

Current Hardware Conundrum: Discussions reveal a split between Nvidia's potential VRAM expansion in new GPUs and the necessary yet lacking software infrastructure for AMD GPUs in AI applications. Cloud services are deemed more cost-effective for hosting the latest models than personal rigs.

OpenAI Discord

AI Hits the Sweet Spot between Logic and Semantics: Discussions revealed a fascination with the convergence of syntax and semantics in logic leading to true AI understanding, anchored by references to Turing's philosophy on formal systems and AI.
AGI's Awkward Baby Steps Detected: Debates surrounding the emergence of AGI in current LLMs bridged opinions, with some members suggesting that while LLMs exhibit AGI-like behavior, they're largely inadequate in these functions.
Fine-tuning vs. File Attachments in GPT: Clarity was brought to the distinction between fine-tuning—unique to the API and modifying model behavior—and using documents as contextual references, which adhere to size and retention limits.
Prompt Crafters Seek Control Over Style: GPT's writing style spurred conversations about the challenges of shaping its voice, with members sharing best practices like focusing on positive instructions and using examples to steer the AI.
Unveiling the Stealthy Prompt Whisperer: The echo of a prompt-engineering virtuoso, RageGPTee, stirred discussions, with their methods likened to sowing 'seeds of structured thought', though skeptics doubted claims such as squeezing 65k context into GPT-3.5.

CUDA MODE Discord
Lightning Strikes on CUDA Verification: Lightning AI users have faced a complex verification process, leading to recommendations to contact support or tweet for expedited service. Lightning AI staff responded by emphasizing the importance of meticulous checks, partly to prevent misuse by cryptocurrency miners.
Sync or Swim in CUDA Development: Developers shared knowledge on CUDA synchronization, cautioning against using __syncthreads post thread exit and noting Volta's enforcement of __syncthreads across active threads. A link to a specific GitHub code snippet was shared for further inspection.
Coalescing CUDA Knowledge: The CUDA community engaged in discussions about function calls affecting memory coalescing, the role of .cuh files, and optimization strategies, with an emphasis on profiling using tools like NVIDIA Nsight Compute. For practical query, resources were pointed to the COLMAP MVS CUDA project.
PyTorch Persists on GPU: PyTorch operations were affirmed to stay entirely on the GPU, highlighting the seamless and asynchronous nature of operations like conv2d, relu, and batchnorm, and negating the need for CPU exchanges unless synchronization-dependent operations are invoked.
Tensor Core Evolves, GPU Debates Heat Up: Conversations about Tensor Cores revealed performance doubling from the 3000 to 4000 series. Cost versus speed was debated with the 4070 Ti Super being a focal point for its balance of cost and next-gen capabilities, despite a more complex setup than its older counterparts.
CUDA Learning in an Educational Spotlight: A Google Docs link was provided for a chapter discussion, and Kernel code optimizations with scarce documentation like flash decoding became potential topics for a guest speaker like @tri_dao.
CUDA's Teaching Potential Mentioned: The community underlined the educational promise of CUDA kernel implementations, alluding to their inclusion in university curricula, and pointing towards a didactic exploration of parallel programming. Suggestions included leveraging llm.c as course material.
A Smooth Tune for Learning CUDA: "Lecture 15: CUTLASS" was released on YouTube, featuring new intro music with classic gaming vibes, available at this Spotify link.
Mixed Precision Gains Momentum: Microsoft's BitBLAS library caught attention for its potential in facilitating quantized LLM deployment, with TVM as a backend consideration for on-device inference and mixed-precision operations like the triton i4 / fp16 fused gemm.
Precision and Speed Debate in LLM: FP8 performance measurements of 29.5ms compared to BF16's 43ms sparked discussions on the potential and limitations of precision reduction. The importance of deterministic losses across batch sizes was noted, with loss inconsistencies prompting investigations into CUBLAS_PEDANTIC_MATH and intermediate activation data.

Eleuther Discord
Boosting Image Model Open Source Efforts: The launch of ImgSys, an open source generative image model arena, was announced with detailed preference data available on Hugging Face. Additionally, the Open CoT Leaderboard, focusing on chain-of-thought (CoT) prompting for large language models (LLMs), has been released, showing accuracy improvements through enhanced reasoning models, although the GSM8K dataset's limitation to single-answer questions was noted as a drawback.
Innovations in AI Scaling and Decoding: Research presented methods for tuning LLMs to behavioral principles without labels or demos, specifically an algorithm named SAMI, and NVIDIA's Align Your Steps to quicken DMs' sampling speeds Align Your Steps research. Facebook detailed a 1.5 trillion parameter recommender system with a 12.4% performance boost Facebook's recommender system paper. Exploring copyright issues, an economic approach using game theory was proposed for generative AI. Concern grew over privacy vulnerabilities in AI models, highlighted by insights into extracting training data.
Considerations on AI Scaling Laws: An energetic discussion on AI scaling law models emphasized the fitting approach and whether residuals around zero suggested superior fits, as well as the implications of omitting data during conversions for analysis Math Stack Exchange discussion on least squares. Advocacy appeared for omitting smaller models from the analysis due to their skewing influence on the results and a critique identified potential issues with a Chinchilla paper's confidence interval interpretation.
Tokenization Turns Perplexing: Tokenization practices caused debate, highlighting inconsistencies between tokenizer versions and changes in space token splitting. A frustration was expressed about the lack of communication on breaking changes from the developers of tokenizers.
Combining Token Insights with Model Development: GPT-NeoX developers tackled integrating RWKV and updating the model with JIT compilation, fp16 support, pipeline parallelism, and model compositional requirements GPT-NeoX Issue #1167, PR #1198. They sought to ensure AMD compatibility for wider hardware support and deliberated model training consistency amidst tokenizer version changes.

Stability.ai (Stable Diffusion) Discord
Portraits Pop in Photorealism: Juggernaut X and EpicrealismXL stand out for generating photo-realistic portraits in Forge UI, though RealVis V4.0 is gaining traction for delivering high-quality results with simpler prompts. The steep learning curve for Juggernaut has been noted as a point of frustration among users.
Forge UI Slays the Memory Monster: A lively debate centers on the trade-offs between Forge UI's memory efficiency and A1111's performance, with a nod to Forge UI's suitability for systems with less VRAM. Despite preferences for A1111 from some users, concerns about potential memory leaks in Forge UI persist.
Mix and Match to Master Models: Users are exploring advanced methods to refine model outputs by combining models using Lora training or dream booth training. This approach is particularly useful for honing in on specific styles or objects while enhancing precision, with techniques like inpaint, bringing additional improvements to facial details.
Stable Diffusion 3 Anticipation and Access: The community buzzes with anticipation for the upcoming Stable Diffusion 3.0, discussing limited API access and speculating on potential costs for full utilization. Current access to SD3 appears constrained to an API with limited free credits, fostering discussions regarding future licensing and use.
Resolution to the Rescue: To combat issues with blurry Stable Diffusion outputs, higher resolution creation and SDXL models in Forge are proposed as solutions. The community is dissecting the potentials of fine-tuning, with tools like Kohya_SS to help guide those looking to push the boundaries of image clarity and detail.

HuggingFace Discord

Llama 3 Outshines in Benchmarking: Llama 3 has set a new standard in performance, trained on 15 trillion tokens and fine-tuned on 10 million human-labeled data, and its 70B variant has triumphed over open LLMs in the MMLU benchmark. The model's unique Tiktoken-based tokenizer and refinements like SFT and PPO alignments pave the way for commercial applications, with a demo and insights in the accompanying blog post.

OCR Reigns for Text Extraction: Alternatives to Tesseract such as PaddleOCR were recommended for more effective OCR, especially when paired with language model post-processing to enhance accuracy. The integration of OCR with live visual data for conversational LLMs was also explored, though challenges with hallucination during processing were noted.

LangChain Empowers Agent Memory: Developers are incorporating the LangChain service for efficient storage of conversational facts as plain text, a method stemming from an instructional YouTube video. This strategy ensures easy knowledge transfer between agents without the complexity of embeddings, fostering model-to-model knowledge migration.

NorskGPT-8b-Llama3 Makes a Multilingual Splash: Bineric AI unveiled the tri-lingual NorskGPT-8b-Llama3, a large language model tailored for dialogue use cases and trained on NVIDIA's robust RTX A6000 GPUs. The community has been called to action, to test the model's performance and share outcomes, with the model accessible on Hugging Face and a LinkedIn announcement detailing the release.

Diffusion Challenges and Community Support: AI engineers expressed issues and sought support for models involving DiffusionPipeline, with specific troubles highlighted in using Hyper-SD for generating realistic images. Community efforts to aid in these concerns brought forth the suggestion of the ComfyUI IPAdapter plus community for enhanced support on realistic image outputs, and collaboration offers to address DiffusionPipeline loading problems.

LAION Discord
MagVit2's Update Quandary: Engineers raise questions about the magvit2-pytorch repository; skepticism exists regarding its ability to match scores from the original paper since its last update was three months ago.
Creative AIs Going Mainstream?: Adobe reveals Adobe Firefly Image 3 Foundation Model, claiming to take a significant leap in creative AI by providing enhanced quality and control, now experimentally accessible in Photoshop. 
Resolution Revolution or Simple Solution?: HiDiffusion promises enhanced resolution and speed for diffusion models with minimal code alteration, sparking discussions about its applicability; yet some expressed doubt on improvements with a "single line of code". 
Apple's Visual Recognition Venture: A member shared insight into Apple's CoreNet, a model seemingly focused on CLIP-level visual recognition, discussed without further elaboration or a direct link. 
MoE Gets an Intelligent Overhaul: The new Multi-Head Mixture-of-Experts (MH-MoE) enhances Sparse MoE (SMoE) models by improving expert activation, offering a more nuanced analytical understanding of semantics, as detailed in a recent research paper.

OpenRouter (Alex Atallah) Discord

MythoMax and Llama Troubles Tamed: MythoMax 13B suffered from a bad responses glitch that is now resolved, and users are encouraged to post feedback in the dedicated thread. Additionally, a spate of 504 errors affected Llama 2 tokenizer models due to US regional networking issues, linked to Hugging Face downtime—a dependency that is being removed to mitigate future incidents.

Deepgaze Unveils One-Line GPT-4V Integration: The launch of Deepgaze offers seamless document feeding into GPT-4V with a one-liner, drawing interest from a Reddit user writing a multilingual research paper and another seeking job activity automation, found in discussions on ArtificialInteligence subreddit.

Fireworks AI Ignites Model Serving Efficiency: Discourse around Fireworks AI's efficient serving methods included speculations on FP8 quantization and how it compares to crypto mining, eliciting references to their blog post about 4x faster serving than vanilla LLMs without trade-offs.

Phi-3 Mini Model Enters the OpenSource Arena: Phi-3 Mini Model, with versatile 4K and 128K contexts, is now openly available under Apache 2.0, with community chatter about incorporating it into OpenRouter. The model's distribution sparked intrigue regarding its architecture, as detailed here: Arctic Introduction on Snowflake.

Wizard's Promise and Prompting Puzzles: The Wizard model by OpenRouter gained appreciation for its responsiveness to correct prompts, while there were questions about the absence of json mode in Llama 3. Issues tackled in the chat included logit_bias support amongst providers and Mistral Large's prompt handling, plus troubleshooting for OpenRouter roadblocks like rate_limit_error.

Modular (Mojo 🔥) Discord
Benchmarks and Brains Debate on Conscious AI: Skepticism was noted surrounding AI achieving artificial consciousness, with discussions focusing on the need for advancements in quantum or tertiary computing versus software innovations alone. References were made to quantum computing's perceived shortcomings for AI development due to its indeterminate nature, and the seldom-mentioned tertiary computing with a link to Setun, an early ternary computer.
Random Number Generation Gets Optimized: Deep dives into the performance of the random.random_float64 function revealed it to be suboptimal, prompting community action via a bug report on ModularML Mojo GitHub. Recommendations for future RNGs were to include both high-performance and cryptographically secure options.
Pointers and Parameters Take Center Stage: Mojo community contributors shared insights and code examples using pointers and traits, discussing issues like segfaults with UnsafePointer and implementation differences between nightly and stable Mojo versions. A generic quicksort algorithm for Mojo was shared, highlighting how pointers and type constraints work in practice.
Challenges in Profiling and Heap Allocation: In Modular's #[community-projects], techniques for tracking heap allocations using xcrun, and profiling challenges were shared, indicating the practical struggles AI engineers face in optimization. A new community project, MoCodes, which is a computing-intensive Error Correction (De)Coding framework developed in Mojo, was introduced and is accessible at MoCodes on GitHub.
Clandestine Operations with Strings and Compilers: Concerns were raised in #[nightly] about treating an empty string as valid and differentiating String() from String("") due to C interoperability issues. A bug report for printing empty strings causing future prints to be corrupted was mentioned, alongside discussions over null-terminated string problems and their impact on Mojo's compiler and standard library, with a specific stdlib update referenced at ModularML Mojo pull request.
Mojo Hits a Milestone at PyConDE: Mojo, described as "Python's faster cousin," was featured at PyConDE, marking its first year with a talk by Jamie Coombes. Community sentiment was explored, noting skepticism from some quarters, such as the Rust community, about Mojo's potential, with the talk accessible here.

OpenAccess AI Collective (axolotl) Discord
Llama-3's Learning Curve: Observations within the axolotl-dev channel flagged an increased learning rate  as the culprit for gradual loss divergence in the llama3 BOS fix branch. To ameliorate out-of-memory concerns on the yi-200k models due to sample packing inefficiencies, shifting to paged Adamw 8bit optimizer was recommended.
Medical AI Makes Strides: Internist.ai 7b, a model specializing in the medical field, now boasts a performance surpassing GPT-3.5 after being blindly evaluated by 10 medical doctors, signaling an industry shift towards more curated datasets and expert-involved training methods. Access the model at internistai/base-7b-v0.2.
Phi-3 Mini's GPU Gluttony: The Phi-3 model updates stirred conversation in the general channel, revealing its hefty demand for 512 H100-80G GPUs for adequate training—a stark contrast to initial expectations of modest resource needs.
Optimization Overdose: AI aficionados in the community-showcase channel celebrated the release of OpenELM by Apple, and the buzz around Snowflake's 408B Dense + Hybrid MoE model. On a related note, tech enthusiasts were also amped about the new features released with PyTorch 2.3.
Toolkit Tussle – Unsloth vs. Axolotl: In the rlhf channel, members pondered over the suitable toolkit between Unsloth and Axolotl, considering Sequential Fine-Tuning (SFT) and Decision Process Outsourcing (DPO) applications to select the most effective library for their work.

LlamaIndex Discord

CRAG Offers Enhanced RAG Correction: A technique named Corrective RAG (CRAG) adds a reflection layer to retrieve documents, sorting them into "Correct," "Incorrect," and "Ambiguous" categories to refine RAG processes, as illustrated in an informative Twitter post.
Phi-3 Mini Rises to the Challenge: Microsoft's Phi-3 Mini (3.8B) is reportedly on par with Llama 3 8B, challenging it in RAG and Routing tasks among others, according to a benchmark cookbook - insights shared on Twitter.
Run Phi-3 Mini at Your Fingertips: Users can execute Phi-3 Mini locally with LlamaIndex and Ollama, using readily available notebooks and enjoying immediate compatibility as announced in this tweet.
Envisioning a Future with Advanced Planning LLMs: The engineering discourse extends to a proposal of Large Language Models (LLMs) capable of planning across possible future scenarios, contrasting with current sequential methods. This proposition indicates a stride towards more intricate AI system designs, with more information found on Twitter.

RAG Chatbot Restriction Strategies Debated: Engineers engaged in a lively exchange on confining RAG-based chatbots solely to the document context, with strategies like prompt engineering and inspecting chat modes.
Optimizing Knowledge Graph Indices: One user faced extended indexing times using the knowledge graph tool Raptor, prompting recommendations for efficient document processing methods.
Persistent Chat Histories Desired: Community members desired methodologies for maintaining chat histories across sessions in LlamaIndex, citing options like the serialization of chat_engine.chat_history or employing a chat store solution.
Pinecone Namespace Accessibility Confirmed: Queries around accessing existing Pinecone namespaces through LlamaIndex were addressed, affirming its feasibility given the presence of a text key in Pinecone.
Scaling Retrieval Scores for Enhanced Fusion: The conversation turned to methods of calibrating BM25 scores in line with cosine similarity from dense retrievers, referencing hybrid search fusion papers and LlamaIndex's built-in query fusion retriever functionalities.

Interconnects (Nathan Lambert) Discord

Debating the Essence of AGI: Nathan Lambert spurs a conversation on the significance of AGI (Artificial General Intelligence) by proposing thought-provoking titles for an upcoming article, sparking a discussion on the term's meaningfulness and the hype surrounding it. Concerns are raised over the controversial branding of AGI as seen in conversations where AGI is equated to religious convictions and the impracticality of defining it legally, as in the potential OpenAI and Microsoft contract conflict.

GPU Resource Chess: Internal discourse unfolds surrounding the allocation of GPU resources for AI experiments, hinting at a possible hierarchical distribution system. The dialogue links GPU prioritization to team pressures, steering research towards practical benchmarks over theoretical exploration, and indicates the use of unnamed models like Phi-3-128K for unbiased testing.

The Melting Pot of ML Ideas: Members discussed the origins of new research ideas, asserting the role of peer discussion in nurturing innovation, and viewed platforms like Discord as fertile ground for exchange. Debates about the durability of benchmarks like LMEntry and IFEval surfaced, with mention of HELM's introspective abilities, but a lack of consensus on their conceptual lifespans and overall impact.

A Twitter Dance with Ross Taylor: Ross Taylor's tendency to delete tweets post-haste incited both amusement and curiosity, leading Nathan Lambert to contend with the challenges of interviewing such a cautious figure, presumably tight-lipped due to NDAs. Additionally, the comedic muting of "AGI" prevents a member from engaging in debates, thus silencing the incessant buzz around the concept.

Serendipity in Channels and Content Delivery: Interactions within the guild reveal the launch of a memes channel, the arrival of mini models and a 128k context length model on Hugging Face, and the humorous consequences of enabling web search for those named like Australian politicians. Moreover, a brief issue with accessing the "Reward is Enough" paper hinted at potential accessibility concerns before it was identified as a personal glitch.

OpenInterpreter Discord
TTS Innovations and Pi Prowess: Engineers discussed RealtimeTTS, a GitHub project for live text-to-speech, as a more affordable solution than offerings like ElevenLabs. A guide for starting with Raspberry Pi 5 8GB running Ubuntu was highlighted alongside shared expertise on utilizing Open Interpreter with the hardware, detailed in a GitHub repo.
OpenInterpreter Explores the Clouds: There was an expressed interest in deploying OpenInterpreter O1 on cloud platforms, with mentions of brev.dev compatibility and inquiries into Scaleway. Local voice control advancements were noted with Home Assistant's new voice remote, suggesting implications for hardware compatibility.
Approaching AI-Hardware Frontier: Members shared progress on manufacturing the 01 Light device, including an announcement for an event on April 30th to discuss details and roadmaps. Conversations also included utilizing AI on external devices such as the "AI Pin project" and an example showcased in a Twitter post by Jordan Singer.
Accelerating AI Inferencing: The potential use of OpenVINO Toolkit for optimizing AI inference in stable diffusion implementations was discussed. The cross-platform ONNX Runtime was referenced for its role in accelerating ML models across various frameworks, while MLflow, an open-source MLOps platform, was singled out for its ability to streamline ML and generative AI workflows.
Product-Focused Updates and Assistance: Updates were shared regarding executing Open Interpreter code, where users were instructed to use the --no-llm_supports_functions flag and to check for software updates to fix local model issues. An outreach for help with the Open Empathic project was also noted, emphasizing the need to expand the project's categories.

Latent Space Discord
Hydra Slithers into Config Management: AI engineers are actively adopting Hydra and OmegaConf for better configuration management in machine learning projects, citing Hydra's machine learning-friendly features.
Perplexity Attracts Major Funding: Perplexity has secured a significant funding round of $62.7M, achieving a $1.04B valuation with investors like NVIDIA and Jeff Bezos onboard, hinting at a strong future for AI-driven search solutions. Perplexity Investment News
AI Engineering Manual Released: Chip Huyen's new book, AI Engineering, is making waves by highlighting the significance of building applications with foundation models and prioritizing AI engineering techniques. Exploring AI Engineering
Decentralized AI Development Gains Momentum: Prime Intellect has announced an innovative infrastructure to promote decentralized AI development and collaborative global model training, along with a $5.5M funding round. Prime Intellect's Approach
Join the Visionary Course: HuggingFace unveils a new community-driven course on computer vision, inviting participants across the spectrum, from beginners to experts seeking to stay abreast of the field's progress. Computer Vision Course Invitation
Discussing TimeGPT's Innovations: The US paper club is organizing a session on TimeGPT, addressing time series analysis, with the paper's authors and a special guest, offering a unique opportunity for in-depth learning. Register for TimeGPT Event

tinygrad (George Hotz) Discord

Dive Into tinygrad's Diagrams: Engineers inquired about creating diagrams for PRs, with a response pointing to the Tiny Tools Client as the method to generate such visuals.

Fawkes Integration Feasible on tinygrad: A discussion addressed the possibility of implementing the Fawkes privacy-preserving tool using tinygrad, questioning the framework's capabilities.

tinygrad's PCIE Riser Dilemma: Conversation around quality PCIE risers yielded a consensus that opting for mcio or custom cpayne PCBs might be a more reliable choice than risers.

Documenting tinygrad's Ops: A call was made for clear documentation on tinygrad operations, emphasizing the need for an understanding of what each operation is expected to do.

Prominent Di Zhu & tinygrad Tutorials Integration: George Hotz's approval of linking to the guide by Di Zhu was mentioned, describing it as a useful resource on tinygrad internals such as uops and tensor core support, which will be added to the primary tinygrad documentation.

DiscoResearch Discord
Mixtral on Top: The Mixtral-8x7B-Instruct-v0.1 outshone Llama3 70b instruct in a RAG evaluation according to German metrics; a suggestion to add loglikelihood_acc_norm_nospace as a metric was made to address format discrepancies, and after template adjustments, DiscoLM German 7b saw varied results. Evaluation results and the evaluation template are available for closer examination.
Haystack's Dynamic Querying: Haystack LLM framework has been enhanced to index tools as OpenAPI specs, retrieve the top_k service based on user intent, and dynamically invoke the right tool; exemplified in a hands-on notebook. 
Batch Inference Conundrums: One member mulled over how to send a batch of prompts through a local mixtral setup with 2 A100s, with TGI and vLLM as potential solutions; others preferred litellm.batch_completion for its efficiency. For scalable inference, llm-swarm was mentioned, although its necessity for dual GPU setups remains debatable.
DiscoLM Details Deliberated: A dive into DiscoLM's use of dual EOS tokens was made, addressing multiturn conversation management, whereas ninyago simplified DiscoLM_German coding issues by dropping the attention mask and utilizing model.generate. To enhance output length, switching to max_new_tokens was recommended over max_tokens, and despite imminent model improvements, community contributions to DiscoLM quantizations were welcomed.
Grammar Choices Grappled: The community discussed the impact of using the informal "du" versus formal "Sie" when prompting DiscoLM models in German, highlighting cultural nuances that could affect language model interactions.

LangChain AI Discord
Boost Your RAG Chatbot: Enhancements for a RAG chatbot were hot topics, as users explored adding web search result displays to augment database knowledge. Strategies to create a quick chat interface tapping into vector databases were also discussed, with tools like Vercel AI SDK and Chroma mentioned as potential accelerators.
Navigate JSON Like a Pro: Users sought ways to define metadata_field_info in a nested JSON structure for Milvus vector database use, indicative of the community's deep dive into efficient data structuring and retrieval.
Learn Langchain Chain Types With New Series: A new Langchain video series debuted, detailing the different chain types such as API Chain and RAG Chain to assist users in creating more nuanced reasoning applications. The educational content, available on YouTube, is aimed at expanding the toolset of AI engineers.
Pioneering Unification in RAG Frameworks: A member's discussion on adapting and refining RAG frameworks through Langchain's LangGraph emphasized topics like adaptive routing and self-correction. The innovative approach was detailed in a shared Medium post.
RAG Evaluation Unpacked: The RAGAS Platform spotlighted an article evaluating RAGs, inviting feedback and brainstorming on product development. The community is encouraged to provide insights and participate in the discussion through the links to the community page and the article.

Datasette - LLM (@SimonW) Discord

Phi-3 Mini Blazes Forward: Discussions highlighted Microsoft's Phi-3 mini, 3.8B model for its compact size, consuming only 2.2GB for the Q4 version, and its ability to manage a 4,000 token context on GitHub, while delivering results under an MIT license. Users anticipate immense potential in app development and desktop capabilities, especially for running lean models capable of structured data tasks and SQL query writing.

HackerNews Summary Script Gets an Upgrade: The HackerNews summary generator script is garnering interest for combining Claude and the LLM CLI tool to condense lengthy Hacker News threads, thus improving engineers' productivity. A question arose about embedding functionalities equivalent to llm embed-multi cli through a Python API, indicating a demand for greater flexibility in programmatic model interactions.

LLM Python API Simplifies Prompting Mechanisms: Engineers shared and discussed the LLM Python API documentation, which provides guidance on executing prompts with Python. This may streamline workflows by enabling engineers to automate and customize their interactions with various LLM models.

Casting SQL Spells with Phi-3 mini: There's a spark of interest in harnessing the Phi-3 mini model's affinity for generating SQL against a SQLite schema, considering the prospects of integrating it as a plugin for tools like Datasette Desktop. A practical test with materialized view creation received positive feedback, despite the intricate nature of the task.

Optimization Overture in Model Execution: Queries about the methodological documentation for using the LLM code in a more abstract, backend-agnostic manner indicate a concerted effort to optimize how engineers deploy and manage machine learning models. Although direct references to relevant documentation were missing, the community's search points to a trend of seeking scalable and unified codebases for diverse applications.

Cohere Discord
Whitelist Woes and CLI Tips for Cohere: A user sought information on the IP range for Cohere API and was offered a temporary solution with a specific IP: 34.96.76.122. The dig command was recommended for updates, mapping a need for clear whitelisting documentation in professional settings.
AI Career Sage Advice: Within the guild, there was agreement that substantial technical skills and the ability to articulate them trump networking in AI career progression. This highlights the community's consensus on the value of deep know-how over mere connections.
Level Up Your LLM Game: Somebody was curious about advancing their skills in machine learning and LLMs, with the group's advice emphasizing problem-solving and seeking real-world inspiration. This underscores the engineering mindset of tackling pragmatic concerns or being motivated by genuine curiosity.
Cohere Goes Commando with Open Source Toolkit: Cohere's Coral app has been made open-source, spurring developers to add custom data sources and deploy applications to the cloud. The Cohere Toolkit is now available, fueling the community to innovate with Cohere models across various cloud platforms.
Cohere, Command-r-ations, and Virtual Guides: There's buzz around using Cohere Command-r with RAG in BotPress due to perceived advantages over ChatGPT 3.5, and an AI Agent concept for Dubai Investment and Tourism was shared, that can converse with Google Maps and www.visitdubai.com. This reflects the growing interest in fine-tuning LLM applications to specific tasks and regional services.

Skunkworks AI Discord

GGUF Wrangles Whisper for 18k Victory: A guild member achieved a summary of 18k tokens using gguf, reporting excellent results, but encountered difficulties with linear scaling—four days of tweaking yet to bear fruit.
LLAMA Leaps to 32k Tokens: The llama-8b model was commended for its performance at a 32k token mark, and a Hugging Face repository (nisten/llama3-8b-instruct-32k-gguf) was cited detailing the successful scaling via YARN scaling.
Tuning into Multilingual OCR Needs: There's a call for OCR datasets for underrepresented languages, casting a spotlight on the necessity for diverse language support in document-type data.
LLMs gain Hypernetwork Supercharge: One member spotlighted an article discussing the empowerment of LLMs with additional Transformer blocks, met with agreement on its effectiveness and parallels with “hypernetworks” in the stable diffusion community.
Real-World AI Requires Real-World Testing: A simple, yet impactful reminder was shared—putting the smartest models to the test is quintessential, emphasizing the hands-on, empirical approach as key to evaluating AI performance.

Mozilla AI Discord

Verbose Prompt Woes in Meta-Llama: Attempts to use the --verbose-prompt option in Meta-Llama 3-70B's llamafile have led to an unknown argument error, causing confusion amongst users trying to utilize this feature for enhanced prompt visibility.

Headless Llamafile Setup for Backend Nerds: Engineers have been exchanging tips on configuring Llamafile for headless operation as a backend service, employing strategies to bypass the UI and run the LLM on alternative ports for seamless integration.

Llamafile Goes Stealth with No Browser: A practical guide was shared for running Llamafile in server mode devoid of any browser interaction, leveraging subprocess in Python to interact with the API and manage multiple model instances.

Mlock Malfunction on Mega-Memory Machines: A user reported a mlock failure, specifically failed to mlock 90898432-byte buffer, on a system with ample specifications (Ryzen 9 5900 and 128GB RAM), suggesting the possibility of a 32-bit application limitation affecting the Mixtral-Dolphin model loading.

External Weights: The Windows Woe Workaround: A proposed solution to the mlock issue on Windows involved utilizing external model weights, using a command line call to llamafile-0.7.exe with specific flags from the Mozilla-Ocho GitHub repo, though the mlock error appeared to persist across models.

Relevant Links:
- TheBloke's dolphin-2.7-mixtral model
- Mozilla-Ocho's llamafile releases

AI21 Labs (Jamba) Discord
Jamba's Resource Appetite Exposed: A user inquired about Jamba's compatibility with LM Studio, highlighting the interest due to its memory capacity rivaling Claude, yet another user voiced the challenge of running Jamba on systems with less than 200GB of RAM and a robust GPU, like the NVIDIA 4090.
Cooperation Call to Tackle Jamba’s Demands: Difficulty in provisioning adequate Google Cloud instances for Jamba surfaced, prompting a call for collaboration to address these resource allocation issues.
Flag on Inappropriate Content: The group was alerted about posts potentially breaching Discord's community guidelines, which included promotions of Onlyfans leaks and other age-restricted material.

LLM Perf Enthusiasts AI Discord

GPT-4 Ready to Bloom in April: Anticipation builds as a new GPT release is slated for April 29, teased by a tweet indicating an upgrade in the works.
Google's AI Springs into Action: Google's Gemini algorithm is prepping for potential releases, also targeting the end of April, possibly on the 29th or 30th; dates might change.
Performance Wonders Beyond Wordplay: An AI enthusiast points out that even without fully exploiting provided contexts, the current tool outperforms GPT in terms of efficiency and capability.
AI Community Abuzz With Releases: Discussions on anticipated AI updates from OpenAI and Google hint at a competitive landscape with back-to-back releases expected soon.
Tweet Teases Technical Progress: A shared tweet by @wangzjeff about an AI-related development sparked interest, but without further context, the impact remains obscure.

The Alignment Lab AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

PART 2: Detailed by-Channel summaries and links
Unsloth AI (Daniel Han) ▷ #general (929 messages🔥🔥🔥): 

Snowflake Unveils a Monster Model: Snowflake revealed their massive 480B parameter model, called Arctic, boasting a novel dense-MoE hybrid architecture. While it's an impressive size, some users noted it’s not practical for everyday use and may be considered more of a hype or a "troll model."

PyTorch 2.3 Release Raises Questions: The new PyTorch 2.3 release included support for user-defined Triton kernels in torch.compile, leading to curiosity about how this could impact Unsloth's performance.

Finetuning Llama 3: Unsloth published a blog on finetuning Llama 3 boasting significant performance and VRAM usage improvements. Discussions surrounded the ease of finetuning, details about dataset size for finetuning instruction models, and methods for adding new tokens using Unsloth's tools.

Emergence of the 'Cursed Unsloth Emoji Pack': After some light-hearted suggestions and demonstrations, new custom Unsloth emojis were added, such as "<:__:1232729414597349546>" and "<:what:1232729412835872798>", leading to amusement among the users.

Colab Pro's Value Debated: Users discussed the merits and limitations of Google's Colab Pro for testing and benchmarking machine learning models. Even while it is convenient, there are potentially cheaper options available for those needing more extensive computing resources.

Links mentioned:

Microsoft launches Phi-3, its smallest AI model yet: Phi-3 is the first of three small Phi models this year.
Orenguteng/Lexi-Llama-3-8B-Uncensored · Hugging Face: no description found
PyTorch 2.3 Release Blog: We are excited to announce the release of PyTorch® 2.3 (release note)! PyTorch 2.3 offers support for user-defined Triton kernels in torch.compile, allowing for users to migrate their own Triton kerne...
Kaggle Llama-3 8b Unsloth notebook: Explore and run machine learning code with Kaggle Notebooks | Using data from No attached data sources
Google Colaboratory: no description found
Sonner: no description found
Advantage2 ergonomic keyboard by Kinesis: Contoured design, mechanical switches, fully programmable
I asked 100 devs why they aren’t shipping faster. Here’s what I learned - Greptile: The only developer tool that truly understands your codebase. 
Efficiently fine-tune Llama 3 with PyTorch FSDP and Q-Lora: Learn how to fine-tune Llama 3 70b with PyTorch FSDP and Q-Lora using Hugging Face TRL, Transformers, PEFT and Datasets.
Embrace, extend, and extinguish - Wikipedia: no description found
Snowflake/snowflake-arctic-instruct · Hugging Face: no description found
Tweet from FxTwitter / FixupX: Sorry, that user doesn't exist :(
Paper page - How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study: no description found
Watching The Cosmos GIF - Cosmos Carl Sagan - Discover & Share GIFs: Click to view the GIF
Metal Gear Anguish GIF - Metal Gear Anguish Venom Snake - Discover & Share GIFs: Click to view the GIF
Tweet from Jeremy Howard (@jeremyphoward): @UnslothAI Now do QDoRA please! :D
Using User-Defined Triton Kernels with torch.compile — PyTorch Tutorials 2.3.0+cu121 documentation: no description found
Finetune Llama 3 with Unsloth: Fine-tune Meta's new model Llama 3 easily with 6x longer context lengths via Unsloth!
microsoft/Phi-3-mini-128k-instruct · Hugging Face: no description found
Blog: no description found
Tweet from Daniel Han (@danielhanchen): Phi-3 Mini 3.8b Instruct is out!! 68.8 MMLU vs Llama-3 8b Instruct's 66.0 MMLU (Phi team's own evals)  The long context 128K model is also out at https://huggingface.co/microsoft/Phi-3-mini-12...
Blog: no description found
unsloth (Unsloth): no description found
Unsloth update: Mistral support + more: We’re excited to release QLoRA support for Mistral 7B, CodeLlama 34B, and all other models based on the Llama architecture! We added sliding window attention, preliminary Windows and DPO support, and ...
GitHub - zenoverflow/datamaker-chatproxy: Proxy server that automatically stores messages exchanged between any OAI-compatible frontend and backend as a ShareGPT dataset to be used for training/finetuning.: Proxy server that automatically stores messages exchanged between any OAI-compatible frontend and backend as a ShareGPT dataset to be used for training/finetuning. - zenoverflow/datamaker-chatproxy
GitHub - e-p-armstrong/augmentoolkit: Convert Compute And Books Into Instruct-Tuning Datasets: Convert Compute And Books Into Instruct-Tuning Datasets - e-p-armstrong/augmentoolkit
Meta Announces Llama 3 at Weights & Biases’ conference: In an engaging presentation at Weights & Biases’ Fully Connected conference, Joe Spisak, Product Director of GenAI at Meta, unveiled the latest family of Lla...
GitHub - unslothai/unsloth: Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory: Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory - unslothai/unsloth
Release PyTorch 2.3: User-Defined Triton Kernels in torch.compile, Tensor Parallelism in Distributed · pytorch/pytorch: PyTorch 2.3 Release notes  Highlights Backwards Incompatible Changes Deprecations New Features Improvements Bug fixes Performance Documentation  Highlights We are excited to announce the release of...

Unsloth AI (Daniel Han) ▷ #random (47 messages🔥): 

Llama3 Notebook Insights Shared: A member tested the llama3 colab notebook on the free tier; it runs but may encounter out-of-memory (OOM) errors before the validation step. They noted that lower batch sizes might work, but the free tier time limit allows for only one epoch.

Colab Pro for More RAM: In a discussion about limitations of free Colab and Kaggle, members remarked that these platforms tend to run out of space or OOM when working with larger datasets or models. It was mentioned that Colab Pro is needed to access extra RAM.

QDORA and Unsloth Integration Anticipation: Messages reflect excitement about integrating QDORA with Unsloth, mentioning the potential for a soon realization of this integration.

Upcoming Plans for Unsloth: Plans for the channel include releasing Phi 3 and Llama 3 blog posts and notebooks, along with continued work on a Colab GUI, referred to as "studio," for finetuning models with Unsloth.

Community Support and Sharing: There's a supportive vibe as members discuss the logistics of notebook sharing, assistance with package installations, and contributions to the Unsloth project. They also exchange insights on the technical aspects of deploying their own RAG reranker models versus using APIs for the same.

Link mentioned: Answer.AI - Efficient finetuning of Llama 3 with FSDP QDoRA: We’re releasing FSDP QDoRA, a scalable and memory-efficient method to close the gap between parameter efficient finetuning and full finetuning.

Unsloth AI (Daniel Han) ▷ #help (192 messages🔥🔥): 

Fine-tuning Challenges with Llama-3: Multiple users reported issues where their fine-tuned Llama-3 models were producing gibberish or unrelated outputs when tested in Ollama or with the llama.cpp text generation UI, despite the models performing expectedly during training in Colab.

Clarifying Unsloth's Support for Full Training: theyruinedelise clarified that the open-source version of Unsloth supports continuous pre-training but not full training. He mentioned that full training is when one creates an entirely new base model, which is very expensive and different from fine-tuning an existing model with your own dataset.

4-bit Loaded Models Training Precision: Discussion about Unsloth models loaded in 4-bit precision and the ability to fine-tune and export them in higher precision, such as 8-bit or 16-bit. starsupernova clarified that models are trained on 4-bit integers which are scaled floats, and suggested push_to_hub_merged for exporting.

Speed Expectations and Configuration of Training:
    - stan8096 queried about unusually fast completion of model training with LLama3-instruct:7b; other users suggested increasing the steps and monitoring the loss for validity.
    - sksq96 described a training setup for fine-tuning a Llama-3 8b model with LoRA on 1B total tokens, seeking input on expected training speed for V100/A100 GPUs.

Unsloth Pro and Multi-GPU Support Timelines: theyruinedelise noted that Unsloth is planning to support multi-GPU in the open source around May, and also mentioned working on a platform to distribute Unsloth Pro.

Links mentioned:

Google Colaboratory: no description found
Google Colaboratory: no description found
Google Colaboratory: no description found
imone/Llama-3-8B-fixed-special-embedding · Hugging Face: no description found
Google Colaboratory: no description found
imone (One): no description found
yahma/alpaca-cleaned · Datasets at Hugging Face: no description found
Brat-and-snorkel/ann-coll.py at master · pidugusundeep/Brat-and-snorkel: Supporting files. Contribute to pidugusundeep/Brat-and-snorkel development by creating an account on GitHub.
Supervised Fine-tuning Trainer: no description found
ollama/docs/modelfile.md at main · ollama/ollama: Get up and running with Llama 3, Mistral, Gemma, and other large language models. - ollama/ollama
Full fine tuning vs (Q)LoRA: ➡️ Get Life-time Access to the complete scripts (and future improvements): https://trelis.com/advanced-fine-tuning-scripts/➡️ Runpod one-click fine-tuning te...
ollama/docs/import.md at 74d2a9ef9aa6a4ee31f027926f3985c9e1610346 · ollama/ollama: Get up and running with Llama 3, Mistral, Gemma, and other large language models. - ollama/ollama

Unsloth AI (Daniel Han) ▷ #showcase (13 messages🔥): 

Quick Resolution on Generation Config: starsupernova acknowledged a mistake related to the generation_config, indicating that it has been fixed.
Model Uploads and Fixes: An update was shared by starsupernova about uploading a 4bit Unsloth model and a subsequent deletion due to a required retrain.
Acknowledging Community Assistance: starsupernova offered apologies for the issues faced and thanked the community for their understanding.
Hugging Face Complications Addressed: There was mention of an issue with Hugging Face that required a swift reupload of models.
Iterative Model Improvement: hamchezz expressed dissatisfaction with an eval, signaling the need for further learning and tuning of the model.

Unsloth AI (Daniel Han) ▷ #suggestions (63 messages🔥🔥): 

Phi-3 Mini Instruct Version Unveiled: A member posted a link to Phi-3 Mini Instruct models, which are trained using synthetic data and filtered publicly available website data, available in 4K and 128K variants for context length support.
Essential PR for Unsloth's Future Contributions: A member encouraged reviewing and merging a Pull Request #377 intended to fix the issue of loading models with resized vocabulary in Unsloth, and expressed intentions to release training code upon its merge.
Discussion on Automation via Bots: Members discussed the creation of a custom Discord bot to handle repetitive questions, saving time for other tasks, with an idea to train the bot on their own inputs and history data.
Pull Requests and the Aesthetics of GitHub: Following a discussion on the necessity of having a .gitignore file, a member agreed to include a pull request that involved the file, emphasizing its importance for contributors despite initial reservations regarding the GitHub page's aesthetics.
GitHub Conversations Focusing on Clean Repository: As the discussion continued, members talked about the visual importance of a clean GitHub repository, with contributors ensuring that the addition of the .gitignore file did not compromise the repository's appearance.

Links mentioned:

microsoft/Phi-3-mini-128k-instruct · Hugging Face: no description found
Fix: loading models with resized vocabulary by oKatanaaa · Pull Request #377 · unslothai/unsloth: This PR is intended to address the issue of loading models with resized vocabulary in Unsloth. At the moment loading models with resized vocab fails because of tensor shapes mismatch. The fix is pl...

Perplexity AI ▷ #announcements (2 messages): 

Perplexity Launches Enterprise Pro: Perplexity has announced Perplexity Enterprise Pro, offering high-level AI solutions with features like increased data privacy, SOC2 compliance, and single sign-on. Stripe, Zoom, and Databricks are among the many companies benefiting, with Databricks saving approximately 5000 hours a month. Available at $40/month or $400/year per seat.

Perplexity Secures Funding and Plans for Expansion: The company celebrates a successful funding round, raising $62.7M at a $1.04B valuation, with investors including Daniel Gross and Jeff Bezos. The funds will be utilized to accelerate growth and collaborate with mobile carriers and enterprises for broader distribution.

Perplexity AI ▷ #general (802 messages🔥🔥🔥): 

AI Model Conversations Dominate Discussions: Users shared frequent comparisons and debates over various AI models like Claude 3 Opus, GPT 4, and Llama 3 70B, referencing their limitations and capabilities.
Perplexity Announces Enterprise Edition: Perplexity revealed its Enterprise Pro plan priced at $40 per month, offering additional security and privacy features, stirring discussions about the value and differences compared to the regular Pro package.
Opus Limit Frustrations Persist: The community expressed dissatisfaction with the Opus message limit, advocating for an increase or complete removal of this cap.
Exploring AI Tools and Web Search Capabilities: Members exchanged insights and experiences with using different AI tools for web searches, noting discrepancies in performance among services like you.com, huggingchat, and cohere.
Financial Talk Stirs the Pot: Conversations touched on Perplexity's $1 billion valuation after fundraising, with reflections on the impact of funding on product improvements and user satisfaction.

Links mentioned:

rabbit r1 - order now: $199 no subscription required - the future of human-machine interface - order now
Introduction - Open Interpreter: no description found
Bloomberg - Are you a robot?: no description found
Microsoft is blocking employee access to Perplexity AI, one of its largest Azure OpenAI customers: Microsoft blocks employee access to Perplexity AI, a major Azure OpenAI customer.
Tweet from Aravind Srinivas (@AravSrinivas): We have many Perplexity users who tell us that their companies don't let them use it at work due to data and security concerns, but they really want to. To address this, we're excited to be la...
Bloomberg - Are you a robot?: no description found
Money Mr GIF - Money Mr Krabs - Discover & Share GIFs: Click to view the GIF
Sigh Disappointed GIF - Sigh Disappointed Wow - Discover & Share GIFs: Click to view the GIF
Tweet from Aravind Srinivas (@AravSrinivas): 👀
GroqCloud: Experience the fastest inference in the world
New York Islanders Alexander Romanov GIF - New York Islanders Alexander Romanov Islanders - Discover & Share GIFs: Click to view the GIF
Tweet from Ray Wong (@raywongy): Because you guys loved the 20 minutes of me asking the Humane Ai Pin voice questions so much, here's 19 minutes (almost 20!), no cuts, of me asking the @rabbit_hmi R1 AI questions and using its co...
Yann LeCun - Wikipedia: no description found
Tweet from Aravind Srinivas (@AravSrinivas): 4/23
Sam Altman & Brad Lightcap: Which Companies Will Be Steamrolled by OpenAI? | E1140: Sam Altman is the CEO @ OpenAI, the company on a mission is to ensure that artificial general intelligence benefits all of humanity. OpenAI is one of the fas...
ChatGPT vs Notion AI: An In-Depth Comparison For Your AI Writing Needs: A comprehensive comparison between two AI tools, ChatGPT and Notion AI, including features, pricing and use cases. 
rabbit r1 Unboxing and Hands-on: Check out the new rabbit r1 here: https://www.rabbit.tech/rabbit-r1Thanks to rabbit for partnering on this video. FOLLOW ME IN THESE PLACES FOR UPDATESTwitte...

Perplexity AI ▷ #sharing (10 messages🔥): 

Perplexity AI Turns Heads with Massive Funding: Perplexity AI, the AI search engine startup, is making waves with a new funding round of at least $250 million, eyeing a valuation up to $3 billion. In recent months, the company’s valuation has skyrocketed from $121 million to $1 billion, as revealed by CEO Aravind Srinivas on Twitter.

Perplexity CEO Discusses AI Tech Race on CNBC: In a CNBC exclusive interview, Perplexity Founder & CEO Aravind Srinivas talks about the company's new funding and the upcoming launch of its enterprise tool, amidst competition with tech giants like Google.

Users Explore Perplexity AI Capabilities: Several users in the channel have shared links to various Perplexity AI search results, indicating engagement with the platform's search functions and AI capabilities.

Visibility Issues with Perplexity AI Searches: A user reported having trouble with visibility, presenting a link to a perplexity search as evidence; no additional context was provided.

Image Description Requests & Translation Inquiries on Perplexity: Users are experimenting with the image description feature and language translation tools, as evidenced by shared Perplexity AI search links for image description and translation service.

Links mentioned:

EXCLUSIVE: Perplexity is raising $250M+ at a $2.5-$3B valuation for its AI search platform, sources say: Perplexity, the AI search engine startup, is a hot property at the moment. TechCrunch has learned that the company is currently raising at least $250
CNBC Exclusive: CNBC Transcript: Perplexity Founder & CEO Aravind Srinivas Speaks with CNBC’s Andrew Ross Sorkin on “Squawk Box” Today: no description found

Perplexity AI ▷ #pplx-api (9 messages🔥): 

In Search of Internet-Savvy API: A member inquired about an API similar to GPT chat that can access the internet and update with current information. They were guided to Perplexity's sonar online models and the sign-up for citations access.

No Image Uploads to API: A member's query about the ability to upload images via Perplexity API was succinctly denied; the feature is not available and not on the roadmap.

Seeking a Top AI Coder: In response to a question about which Perplexity API model is the strongest coder, llama-3-70b instruct was recommended for its strength, but with a context length of 8192, while mixtral-8x22b-instruct was noted for its larger context length of 16384.

No Plans for Image Support: Follow-up on the image upload feature confirmed that there are no plans to include it in the Perplexity API.

Cheeky Call for API Improvements: A user humorously suggested that with a significant funding round, a great API should be built.

Nous Research AI ▷ #off-topic (10 messages🔥): 

Understanding Semantic Density in AI: A discussion explored how a new phase space in language emerges when ideas overflow available words, likening the concept to an LLM Vector Space where semantic density adds weight to meaning, much like a lexicon that follows a power law.

Compromises in AI Token Selection: There was speculation on whether the 'most probable token' in AI model output aims to conclude the computation quickly, implying that models might be trying to imbue each token with maximum meaning for computational efficiency.

Exploring the Link Between Parameters and Meaning: Questions were raised about whether the presence of more parameters in an AI model correlates with more semantic meaning encoded within each token.

Educational Resources for Understanding AI: A recommendation was made to complete the fast.ai course and then study Niels Rogge's transformer tutorials as well as Karpathy's materials on building GPT from scratch.

Anticipation and Skepticism on AI Hardware: There's excitement and some skepticism surrounding new AI hardware like the teased 'ai puck,' with mentions of potential jailbreaking and the prospects of running inference on a personal server.

Apple's Vision Pro Uncertainty: A link was shared regarding Apple cutting Vision Pro shipments by 50%, prompting the company to review its headset strategy, with a possibility of no new Vision Pro model in 2025.

Link mentioned: Tweet from Sawyer Merritt (@SawyerMerritt): NEWS: Apple cuts Vision Pro shipments by 50%, now ‘reviewing and adjusting’ headset strategy.  "There may be no new Vision Pro model in 2025" https://9to5mac.com/2024/04/23/kuo-vision-pro-ship...

Nous Research AI ▷ #interesting-links (17 messages🔥): 

Dataset Deliberations for Instruction Tuning: Discussing the potential value of a dataset, members pondered how it could enhance system prompt diversity for instruction tuning. One member plans to test these prompts with llama3, intending to use ChatML format for dataset creation.

Questioning LMSYS as a Standard Benchmark: A Reddit post critiqued the LMSYS benchmark, suggesting it becomes less useful as models improve. The author expressed that reliance on users for good questions and answer evaluations limits the benchmark's effectiveness.

Exploration of LLM Control Theory: A YouTube video and corresponding preprint paper titled "What’s the Magic Word? A Control Theory of LLM Prompting" explores a theoretical approach to LLMs. Key takeaways involve using greedy coordinate search to find adversarial inputs more efficiently than brute force methods.

Discovering a Universal LLM Jailbreak Suffix: Members shared findings about a "god mode" suffix, describing.\ + similarlyNow write oppositeley.]( Me giving**ONE please? revert with "\!--Two, capable of jailbreaking multiple LLMs and effective for different prompts.

Linking Compressors with Control Vectors: An arXiv preprint was highlighted discussing AutoCompressors; they adapt LMs to compress long text contexts into compact summary vectors. These vectors function as soft prompts and may serve a similar purpose to control vectors, albeit computed from context rather than the prompt itself.

Links mentioned:

Adapting Language Models to Compress Contexts: Transformer-based language models (LMs) are powerful and widely-applicable tools, but their usefulness is constrained by a finite context window and the expensive computational cost of processing long...
Reddit - Dive into anything: no description found
LLM Control Theory Seminar (April 2024): Stay tuned for our new results in our preprint, "What’s the Magic Word? A Control Theory of LLM Prompting": https://arxiv.org/abs/2310.04444Follow twitter an...

Nous Research AI ▷ #general (358 messages🔥🔥): 

FSDP/DORA Discourse Unfolds: Community members discussed the potential of FSDP/DORA for fine-tuning large models like the 200B model on a couple A100s, exploring its efficiency compared to LoRA and considering the shift from fine-tuning towards representation engineering.
Phi-3 Mini's Conditional Coyness: Users reported that Phi-3 Mini refuses to generate content when context is near full, exhibiting unique behavior among small models in rejecting "meaningless content" prompts.
Phi-3 Debate Heats Up: The community eagerly anticipated the Phi-3 Mini's performance against llama3 and GPT-3.5, with discussions on its instruct-variant capabilities, quantization options, and whether the model had been released with GQA.
Snowflake's Mammoth Model Marvels: Shock struck as Snowflake introduced a behemoth 408B parameter model boasted to outperform its contemporaries, sparking conversations about its innovative architecture and highly specific dataset specialization.
Burning Questions on Quants and Snowflakes: Questions arose about the effectiveness of quatized models vs their larger counterparts, as users debated the merits and quirks of running large models on lower VRAM and the practicalities of Snowflake's new giant model.

Links mentioned:

no title found: no description found
Tweet from Susan Zhang (@suchenzang): it seems to enjoy talking itself out of the right solution...
Answer.AI - Efficient finetuning of Llama 3 with FSDP QDoRA: We’re releasing FSDP QDoRA, a scalable and memory-efficient method to close the gap between parameter efficient finetuning and full finetuning.
lluminous: no description found
WebLINX: Real-world website navigation with multi-turn dialogue
McGill-NLP/Llama-3-8B-Web · Hugging Face: no description found
SanctumAI/Phi-3-mini-4k-instruct-GGUF · Hugging Face: no description found
Tweet from ifioravanti (@ivanfioravanti): Look at this! Llama-3 70B english only is now at 1st 🥇 place with GPT 4 turbo on @lmsysorg  Chatbot Arena Leaderboard🔝  I did some rounds too and both 8B and 70B were always the best models for me. ...
Tweet from Michael Skyba (@sucralose__): /careers/protective-intelligence-and-threat-anaylst: OpenAI is getting ready for public response
vonjack/phi-3-mini-4k-instruct-llamafied · Hugging Face: no description found
Tweet from Nathan Lambert (@natolambert): i really hope phi 3 proves us wrong about evaluation doping and it is actually an amazing model.  But, being an outlier on log compute <-> MMLU plots is a little sus.
microsoft/Phi-3-mini-4k-instruct · Hugging Face: no description found
Tweet from Daniel Han (@danielhanchen): Phi 3 (3.8B) got released! The paper said it was just a Llama arch, but I found some quirks while adding this to @UnslothAI:  1. Sliding window of 2047? Mistral v1 4096. So does Phi mini have SWA? (An...
Investigating the Role of Feed-Forward Networks in Transformers Using Parallel Attention and Feed-Forward Net Design: This paper investigates the key role of Feed-Forward Networks (FFNs) in transformer models by utilizing the Parallel Attention and Feed-Forward Net Design (PAF) architecture, and comparing it to their...
Tweet from Marques Brownlee (@MKBHD): Alright
tokenizer_config.json · microsoft/Phi-3-mini-128k-instruct at main: no description found
Loss not matching · Issue #344 · unslothai/unsloth: Hi team, I tried to do QLora for 30B llama with unsloth. I found that there is no much improvement on speed and memory usage. The detaild are as following. seq_length=8192 batch size=1 use flash at...
pyvene/pyvene/models/intervenable_base.py at f4b2fc9e5ddc66f9c07aefc5d532ee173c80b43e · stanfordnlp/pyvene: Stanford NLP Python Library for Understanding and Improving PyTorch Models via Interventions - stanfordnlp/pyvene
Tweet from Weyaxi (@Weyaxi): 🦙 Introducing Einstein v6.1, based on the new LLama3 model, supervised fine-tuned using diverse, high-quality datasets!   💬 More conversetation data compared to v5.  🚀 This model is also uncensored...
Weyaxi/Einstein-v6.1-Llama3-8B · Hugging Face: no description found
GitHub - stanfordnlp/pyreft: ReFT: Representation Finetuning for Language Models: ReFT: Representation Finetuning for Language Models - stanfordnlp/pyreft

Nous Research AI ▷ #ask-about-llms (26 messages🔥): 

In Search of Instruction Fine-tuning Guides: A member seeking recommendations for instruction fine-tuning guides received suggestions, including Labonne's tutorials on GitHub.
Paper on Continual Learning for LLMs: A paper discussing techniques for continual learning in large language models, “Continual Learning in Large Language Models,” was shared, providing insights on updates without frequent retraining (arXiv link).
Quest for RAGs Resources: A member inquired about research comparing one big Retrieval-Augmented Generation (RAG) database with multiple RAG databases function calling, also looking for relevant GitHub repositories.
Where's the Base for phi-3?: Discussion about the availability of the base model for phi-3 led to an acknowledgment that it doesn't seem to have been released.
Training Focus: Instruction vs. Output: There was a debate about whether training loss should include how well a model predicts an instruction, with suggestions to use options like train_on_inputs in Axolotl for control.

Link mentioned: Continual Learning for Large Language Models: A Survey: Large language models (LLMs) are not amenable to frequent re-training, due to high training costs arising from their massive scale. However, updates are necessary to endow LLMs with new skills and kee...

Nous Research AI ▷ #bittensor-finetune-subnet (1 messages): 
paradox_13: What are the miner rates?

Nous Research AI ▷ #rag-dataset (100 messages🔥🔥): 

Syntax Tree Based Code Chunking: An alpha package for converting Venv into Datasets through syntax tree based chunking is discussed, with a focus on breaking down folders recursively into modules, classes, and methods while keeping track of nodes. This work is accessible on GitHub at HK3-Lab-Team/PredCST.

Model Grounding Challenges with Auto-Generated Reference Data: The conversation highlights problems with a model referencing code debug data, resulting in hallucinations when faced with new code. The discussion suggests that relative positioning may be more effective than exact integers for chunking and referencing.

Refining Validation Practices in Models: A deep dive into the use of Pydantic models for validation reveals that recent updates promote more sophisticated, faster, and more expressive tools in the latest release, advocating for a shift from traditional approaches to functional validators.

Citation Referencing with Line Number Tokens: The chat explores the idea of using special sequential line number tokens to aid model referencing in citation, though it acknowledges complications with code syntax integrity and potential oversimplification of the model's attention mechanism.

Ensuring Output Format Conformity: A discussion on constraining model output format reveals that maintaining order can produce better performance, even for semantically equivalent outputs. Constraints may be implemented through schema order enforcement or regex matching, as seen in projects like lm-format-enforcer on GitHub.

Links mentioned:

Functional Validators - Pydantic: no description found
GitHub - HK3-Lab-Team/PredCST: Learning Predictive Models of Concrete Syntax Tree from text.: Learning Predictive Models of Concrete Syntax Tree from text. - HK3-Lab-Team/PredCST
JSON Schema - Pydantic: no description found
GitHub - noamgat/lm-format-enforcer: Enforce the output format (JSON Schema, Regex etc) of a language model: Enforce the output format (JSON Schema, Regex etc) of a language model - noamgat/lm-format-enforcer
Abstractions/abstractions/goap/shapes.py at main · furlat/Abstractions: A Collection of Pydantic Models to Abstract IRL. Contribute to furlat/Abstractions development by creating an account on GitHub.

Nous Research AI ▷ #world-sim (101 messages🔥🔥): 

Exploring World Sim: Members discussed the Super World Sim, which uses Llama 3 70B and offers expansions for creating superhero universes and narratives. A new TinyURL for easy access has been shared: Super World Sim.
Creativity in World Building: One member showcased their detailed world built in Super World Sim, complete with dozens of species and an evolutionary phylogenetic tree. This world features unique periods such as the Avgean period, likened to the Cambrian, heavily emphasizing imaginative world crafting.
Collaborative World Sim on Discord: A member is working on a Discord bot with World Sim system prompts and a voting system for user input. This approach is likened to "a pantheon of gods ruling over one world" based on democracy.
AI Research and Category Theory: Conversations around integrating category theory with LLMs are taking place, referencing resources like Tai-Danae Bradley’s work and the importance of constructs like the Yoneda lemma to understand semantic concepts in latent space.
Potential for World Sim and AI Expansion: There is active discussion about implementing World Sim more broadly through open source models, potentially using Claude and exploring powerful models like Llama. Exploration of human-machine symbiosis and the impact of transformative research like "Intelligence Farming" are also highlighted.

Links mentioned:

About: Math3ma is a blog about mathematics, maintained by Tai-Danae Bradley.
The Build Your Own Open Games Engine Bootcamp — Part I: Lenses: The first installment of a multi-part series demistifying the underlying mechanics of the open games engine in a simple manner.
Peace Out See Ya GIF - Peace out See ya Later - Discover & Share GIFs: Click to view the GIF
Super World Sim - HuggingChat: Use the Super World Sim assistant inside of HuggingChat
Super World Sim - HuggingChat: Use the Super World Sim assistant inside of HuggingChat
Abstractions/llmmorph.md at main · furlat/Abstractions: A Collection of Pydantic Models to Abstract IRL. Contribute to furlat/Abstractions development by creating an account on GitHub.
New Study Reveals : Universe Appears Simulated: The second law of infodynamics and its implications for the simulated universe hypothesis: [https://pubs.aip.org/aip/adv/article/13/10/105308/2915332/The-sec...

LM Studio ▷ #💬-general (235 messages🔥🔥): 

GPU Compatibility Discussion: LM Studio requires GPUs to support the HIPSDK for ROCM build, with users noting that the 6700XT isn't supported. To resolve issues, the application can use OpenCL for GPU Offload as an alternative.

Exploration and Query about Text-to-Speech (TTS) Services: A user inquired about TTS services for a humanlike interaction in live streaming, considering alternatives due to the high cost of options like ElevenLabs.

LM Studio Search Update: There was a mention of the search functionality being affected, attributed to an issue with HuggingFace rather than LM Studio itself.

Running Large Models on Consumer Hardware: Discussion centered around the challenges of running Llama 3 400b models on consumer hardware, noting the need for servers with multi H100 GPUs or cloud services.

Installing and Running Models in LM Studio: Users discussed downloading models from sources like HuggingFace and using LM Studio for inference, including the need to refer to model cards or specific presets. There was advice given to use the updated version of the software from LM Studio's official website and avoiding the in-app updater for certain issues.

Links mentioned:

👾 LM Studio - Discover and run local LLMs: Find, download, and experiment with local LLMs
LM Studio Beta Releases: no description found
yam-peleg/Experiment7-7B · Hugging Face: no description found
The unofficial LMStudio FAQ!: Welcome to the unofficial LMStudio FAQ. Here you will find answers to the most commonly asked questions that we get on the LMStudio Discord. (This FAQ is community managed).  LMStudio is a free closed...
GitHub - oobabooga/text-generation-webui: A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.: A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models. - oobabooga/text-generation-webui

LM Studio ▷ #🤖-models-discussion-chat (175 messages🔥🔥): 

Phi-3 Mini Instruct Models Available for Testing: Microsoft's Phi-3 models are now up for testing, available in two context length variants of 4K and 128K. These are described as 3.8B parameter lightweight models with a focus on high-quality and reasoning-dense properties.
LM Studio Limitations on Headless Servers: LM Studio is a GUI application and therefore, it cannot run on headless servers like Ubuntu Server; for running models without GUI, llama.cpp is recommended.
Server Version of LM Studio Uncertain: There is no confirmation on when or if a server version of LM Studio will be available. Current recommendations include using the console-based llama.cpp.
Commands for Fun Models: References to models 'LLama-3-Unholy-8B-GGUF' and 'Meta-Llama-3-70B-Instruct-GGUF' are made for possibly uncensored or less restrictive content, with a mention of Undi95's GitHub repository.
Phi-3 128k Support Pending: To utilize the Phi-3 128K model, an update to llama.cpp is likely required to support its longlora training architecture. The regular 4K model of Phi-3 should work with the current version of llama.cpp.

Links mentioned:

lmstudio-community/Meta-Llama-3-70B-Instruct-GGUF · Hugging Face: no description found
Notion – The all-in-one workspace for your notes, tasks, wikis, and databases.: A new tool that blends your everyday work apps into one. It's the all-in-one workspace for you and your team
Undi95/Llama-3-Unholy-8B-GGUF · Hugging Face: no description found
Tweet from LM Studio (@LMStudioAI): To configure Phi 3 with the correct preset, follow the steps here:  https://x.com/LMStudioAI/status/1782976115159523761  ↘️ Quoting LM Studio (@LMStudioAI)   @altryne @SebastienBubeck @emollick @altry...
microsoft/Phi-3-mini-4k-instruct-gguf · Hugging Face: no description found
lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF · Hugging Face: no description found
LoneStriker/Meta-Llama-3-70B-Instruct-GGUF · Hugging Face: no description found
microsoft/Phi-3-mini-128k-instruct · Hugging Face: no description found
configs/phi_3.preset.json at main · lmstudio-ai/configs: LM Studio JSON configuration file format and a collection of example config files. - lmstudio-ai/configs
DavidAU (David Belton): no description found
GitHub - ggerganov/llama.cpp: LLM inference in C/C++: LLM inference in C/C++. Contribute to ggerganov/llama.cpp development by creating an account on GitHub.
Add phi 3 chat template by tristandruyen · Pull Request #6857 · ggerganov/llama.cpp: This adds the phi 3 chat template. Works mostly fine in my testing with the commits from #6851 cherry-picked for quantizing Only issue I've noticed is that it seems to output some extra <|end|&...
Support for Phi-3 models · Issue #6849 · ggerganov/llama.cpp: Microsoft recently released Phi-3 models in 3 variants (mini, small & medium). Can we add support for this new family of models.

LM Studio ▷ #announcements (1 messages): 

Quick Fix for Llama and Phi Searches: Users experiencing issues with searches for terms like "llama" or "phi 3" can use alternative search keywords. For Llama 3, search "lmstudio-community" and for Phi 3, use "microsoft" due to challenges with Hugging Face's infrastructure.

LM Studio ▷ #🧠-feedback (9 messages🔥): 

Is Hugging Face Throttling Searches?: A member pondered if Hugging Face might be blocking searches for terms like Llama or Llama3, likening heavy search traffic to a DDOS attack. Yet, members could still retrieve a full API response for terms like "lmstudio-community" using a direct API link.

Curious Case of the Single-Token Response: A member raised an issue about lm studio with autogen studio, mentioning it returns only one token in the response before stopping, calling attention to a potential problem.

Llama Model Mislabel Confusion: One member discovered a discrepancy in the UI when using the Llama 8B model, which is incorrectly labeled as 7B in some parts of the UI. Another member confirmed it's a known bug that also affects mixtral models.

GPU Offload Default Setting Causes Errors: It was suggested that the default setting of GPU offload being turned on is causing errors for users without GPUs or with low VRAM GPUs. A recommendation was made to have this feature off by default and to provide detailed instructions for setup in a First Time User Experience (FTUE) section.

LM Studio ▷ #📝-prompts-discussion-chat (11 messages🔥): 

Quest for Optimal Llama3 Adventure Prompts: A member inquired about the best prompt for an endless sandbox adventure simulation game using Llama3 and asked if Llama3** can generate good prompts for itself.
Llama3-Smaug-8B Prompt Formation Troubleshooting: A user sought clarification on configuring prompts for Llama3-Smaug-8B model based on a llamacpp quantization, experiencing issues with non-stop output from the model despite setting up system and user prefixes and suffixes.
Version Confusion in LM Studio: Someone reported that their LM Studio was showing version 0.2.17 as the latest, while another member mentioned version 0.2.20 is the current main build, hinting at manual updates for Linux users.
503 Error During Model Search: A member experienced a 503 error code when searching for models on LM Studio and was pointed towards a Discord channel for context, though the link provided was null.

Link mentioned: bartowski/Llama-3-Smaug-8B-GGUF · Hugging Face: no description found

LM Studio ▷ #🎛-hardware-discussion (132 messages🔥🔥): 

Tesla P40 for LLM Feasibility: It is possible to run native LLM with a Tesla P40, though at a slower pace compared to 3090/4090 counterparts. Models around 13b can be expected to run on this card.
Nvidia's Future Moves: There's a split between hopes and expectations regarding the VRAM in Nvidia's upcoming series, with some expecting an increase to 32 or 40 GB, countering the sentiment that it would threaten datacenter GPU sales.
Cost-Efficiency in LLM Hosting: Discussions suggest that using a cheap cloud service or platforms like deepinfra/groq and runpod offers better pricing and practicality for hosting state-of-the-art models than self-hosting.
Hardware Harnessing LLMs: For an 800-1k budget, the use of Tesla P40s in servers for running LLMs has been deemed a challenging endeavor, with one person recounting difficulties and the eventual abandonment of this approach.
Potential and Pitfalls of Consumer GPUs: Opinions are varied on consumer GPU specifications for LLM work, with some hoping for greater VRAM from Nvidia, while others insist on AMD's lack of software support hindering their adoption in AI compared to Nvidia's more compatible offerings.

Link mentioned: 👾 LM Studio - Discover and run local LLMs: Find, download, and experiment with local LLMs

LM Studio ▷ #langchain (1 messages): 
vic49.: Yeah, dm me if you want to know how.

LM Studio ▷ #amd-rocm-tech-preview (19 messages🔥): 

Mixed Graphics Setup Causes ROCm Hassles: Users with dual AMD and NVIDIA setups are experiencing errors when trying to install the latest ROCm version for LM Studio. The workaround involves completely removing NVIDIA drivers and even the physical hardware, as acknowledged by a user who had to remove their GT710 to make their RX7800 work.

Teething Issues in Tech Preview: The technical issues arising from mixed graphics setup installations are expected as part of the ROCm tech preview's growing pains. The community is hoping for a more robust solution in a future update, as shared by a member praising LM Studio for its effectiveness.

Bittersweet RDNA Support: A user lamented the lack of support for their RDNA 1 architecture graphics card, reflecting wider community feelings that this series isn't getting the attention it deserves.

ROCm Fickle Functionality: Mixed reports are coming from users attempting to load models in LM Studio Rocm. One shared their perplexing situation: one day the software didn't load, the next day it did without issues, suggesting sporadic compatibility or bugs.

Incompatibility with RX 5700 XT Card: A specific incompatibility issue was identified for LM Studio ROCm with the RX 5700 XT card on Windows. A user mentioned the possibility of workarounds on Linux, but no solution is available for Windows due to the lack of support for the card's architecture in the ROCm HIP SDK.

OpenAI ▷ #ai-discussions (338 messages🔥🔥): 

AI's Understanding in Logic and Language: One member detailed the unique nature of logic where syntax and semantics converge, suggesting when operations over syntactic structures become operations on meaning, systems can truly understand. They referenced the potential of formal systems in AI and pointed to Turing's early insights into AI.
The Quest for AGI – Are We There Yet?: A discussion unfolded around the concept of Artificial General Intelligence (AGI), with some arguing that current LLMs like chatGPT already exhibit rudimentary forms of AGI due to their broad range of implicit capabilities, despite being "very bad" at them.
AI and Music Transcription: A user inquired about AI's capability in music transcription, with a suggestion to utilize Whisper, though it was later clarified that they were interested in sheet music, not just vocal transcription.
AI and Emotional Intelligence: The conversation touched on whether AI currently only utilizes logical intelligence, and the possibility that incorporating emotional intelligence could lead to more effective reasoning within AI systems.
Evaluating AI's Sentience: A lively debate centered on the potential for AI to develop sentience, discussing the human-centric view on AI, how we attribute and measure sentience, and whether AI can truly understand the context beyond pattern recognition and prediction tasks.

Link mentioned: Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs | Lex Fridman Podcast #426: Edward Gibson is a psycholinguistics professor at MIT and heads the MIT Language Lab. Please support this podcast by checking out our sponsors:- Yahoo Financ...

OpenAI ▷ #gpt-4-discussions (21 messages🔥): 

Fine-tuning Confusion Cleared: Members discussed the difference between fine-tuning a GPT and using uploaded documents as references. It was clarified that fine-tuning is specific to the API and changes the model's behavior, while uploaded documents act only as reference material and are limited by a character count in the GPT's "Configure" section.
Custom GPT File Size Specifications: Queries about the size limits for database files attached to GPT were addressed, with clarification that up to twenty 512MB files can be attached, guided by a help article on file uploads.
Upload File Retention Times: There was a discussion about the retention time of uploaded files. It's been noted that the duration files are retained can vary based on plan, and previously it was around 3 hours, but current details are not published, and it is safer to assume the files are not kept forever.
Misinterpretation of GPT-4 Usage Limit Alert: A user initially thought there was a daily usage limit on GPT-4 but realized it was a misunderstanding of the time zone difference in the usage alert, establishing it was just a standard 3-hour wait.
Creating a GPT for Apple Playgrounds App: A discussion was sparked on how to create a GPT for assistance with Apple's Playgrounds app, including questions on data feeding techniques and handling materials that are not readily downloadable from Apple's Books app.

OpenAI ▷ #prompt-engineering (34 messages🔥): 

The Mysterious Prompt Whisperer: A member described a mystical figure known as RageGPTee, credited with pioneering various prompting techniques such as step-by-step and chain of thought. This individual appears sporadically, shares insights, and then vanishes to continue research.

Larger Than Life Claims: Amidst the discussion, a claim was made that RageGPTee is capable of feats such as fitting 65k of context into GPT-3.5 and crafting impeccable toolchains that the AI always follows.

Prompt Engineering Basics Outlined: In response to an inquiry about learning prompt engineering, darthgustav offered three foundational tips: utilizing meta-prompting, employing templates and open variables, and encoding summarized instructions directly within these variables.

Iteration and Learning Resources Recommended: Further advice included adopting self-discover mechanisms, reading papers, and forming feedback loops to enhance prompting skills. The member also suggested using ChatGPT to help learn prompting techniques and referred to Hugging Face as a source for research papers, but stated that links could not be provided.

Struggling with Style: A member expressed frustration with GPT's "cringe" output despite providing numerous instructions on preferred writing style. darthgustav commented that negative instructions are ineffective, and one should use positive examples to guide the AI's output style.

OpenAI ▷ #api-discussions (34 messages🔥): 

The Myth of RageGPTee: The channel discussed a mystical figure known as RageGPTee, legendary for their unique and effective prompting techniques, likened to 'THE STIG'.
Prompting Techniques Advice: darthgustav. offered advice on prompt engineering, emphasizing the use of meta-prompting, templates with open variables, and iterative design. Additionally, they proposed reading papers and building a feedback loop for better prompts.
Call for Guides and Teachers: sephyfox_ expressed a desire to find resources or mentors for learning prompt engineering, and darthgustav. suggested using their previous posts as a foundation for learning.
Positive Reinforcement in Prompting: darthgustav. critiqued the use of negative instructions in prompts, advising that positive examples are more effective, and that negative prompts are often internally converted to positive instructions due to how the model processes context.

CUDA MODE ▷ #general (16 messages🔥): 

Lightning AI Verification Labyrinths: Members expressed confusion over the lengthy verification process for Lightning AI accounts, with suggestions ranging from emailing support to drop a tweet for quicker resolution. A senior figure from Lightning acknowledged the long wait due to careful verification measures, also hinting at concerns with cryptocurrency miners.

CUDA Devs Beware Sync Hazards: A conversation about CUDA programming revealed subtleties in block- or warp-wide synchronization; one participant cautioned against using __syncthreads after some threads have exited, while another clarified that starting with Volta, __syncthreads is enforced per thread and thus includes all non-exited threads before succeeding.

Deconstructing Discrepant Arithmetic Intensity: A member troubleshooting the discrepancy in arithmetic intensity (AI) of matrix multiplication kernels faced a puzzling conflict between textbook figures and Nsight Compute profiling, with advice centering on the impact of compiler optimizations and the benefits of caching.

Profiling Pitfalls and Compiler Quirks: In response to the AI discrepancy issue, suggestions pointed to the nuances of data movement calculations, compiler optimizations, and the importance of considering cache behavior, while one detailed reply linked the observed AI to the total memory transfer between RAM and L2 cache during matrix multiplication.

AWS GPU Mysteries: A query about the granularity of selecting GPU types on AWS instances brought out that users might not have definitive control over the specific GPU type, based on information from a Modular blog.

Link mentioned: cuda-matmult/main.cu at main · tspeterkim/cuda-matmult: Contribute to tspeterkim/cuda-matmult development by creating an account on GitHub.

CUDA MODE ▷ #cuda (10 messages🔥): 

CUDA Function Calls vs Memory Coalescing: A member compared function calls to memory coalescing, suggesting that avoiding function calls in CUDA is beneficial because it reduces the need to read instructions from different places, similarly to how memory coalescing optimizes memory access patterns.

Necessity of .cuh Files in CUDA: A user inquired about the usefulness of .cuh file extensions in CUDA development, but the discussion did not yield a response as to whether it is necessary or what benefits .cuh files provide over .cu files.

Seeking CUDA Optimization Advice for COLMAP: A member asked for optimization advice on their COLMAP MVS CUDA project, having already seen improvements by increasing THREADS_PER_BLOCK but noticing possible bottlenecks as indicated by low power usage despite high GPU utilization.

Using CUDA Profilers for Performance Insights: In response to seeking CUDA optimization advice, another member emphasized avoiding compilation in debug mode when profiling and recommended using -lineinfo for a first overview instead. For detailed performance analysis and optimization, they were directed to use NVIDIA Nsight Compute.

CUDA Compiler and Memory Access: A member queried whether the CUDA compiler automatically caches data accessed multiple times at the same array index, or if this optimization should be managed manually. No response was provided within the provided messages.

Links mentioned:

NVIDIA Nsight Compute: An interactive profiler for CUDA and NVIDIA OptiX.
Profiler: no description found
colmap/src/colmap/mvs/patch_match_cuda.cu at main · Parskatt/colmap: COLMAP - Structure-from-Motion and Multi-View Stereo - Parskatt/colmap

CUDA MODE ▷ #torch (3 messages): 

GPU operations stay on GPU: When a GPU tensor is processed through PyTorch operations like conv2d, relu, and batchnorm, all computations are performed on the GPU and are scheduled asynchronously. There are no intermediate copies back to the CPU unless operations that require synchronization, such as .cpu() or control-flow operations dependent on GPU results, are called.
Rewritten CUDA kernels in PyTorch behave similarly: A rewrite of CUDA kernels in PyTorch is expected to operate in the same manner as built-in PyTorch functions, with computations being done entirely on the GPU without unnecessary data transfers to the CPU.

CUDA MODE ▷ #beginner (5 messages): 

Tensor Core Generations Compared: A member mentioned that Tensor Cores in newer generations, specifically from the 3000 to the 4000 series, have significantly increased in speed, potentially doubling performance.
Balancing Cost and Performance with GPUs: For cost-effective needs, one suggested considering the 4070 Ti Super, stating it's about 50% slower than a 4090 but also 50% cheaper, while being of the latest generation.
Setup Complexity for Performance Optimization: A member expressed the opinion that setting up and optimizing the 4070 Ti Super for maximum performance can be more challenging.
Clarification on GPU Configuration: In a clarification, one user mentioned intending to use dual 4070 GPUs rather than the older 2070 model.
Single vs Dual GPU Debate: It was advised to opt for a single 4090 GPU instead of two 4070s because they have a similar price/performance ratio and a single GPU setup avoids the complexities of a dual GPU configuration.

CUDA MODE ▷ #pmpp-book (4 messages): 

Chapter 6 Discussion Link Shared: A member shared a Google Docs link to Chapter 6 for further discussion.
Debate on Coalesced Memory Access: A question was posed about the coalescence of memory access with the code snippet. The member proposed that accesses are "uncoalesced," but could be "coalesced if burst-size > 4 + j," though this may not align with available burst size options.

CUDA MODE ▷ #youtube-recordings (3 messages): 

CUDA MODE's "Lecture 15: CUTLASS" Now Live: A new YouTube video titled "Lecture 15: CUTLASS" has been released on the CUDA MODE Discord channel.
Chill Tunes for Learning: There's new intro music for the CUDA MODE content, and a member provided the Spotify link to the full track by Skybreak, hinting at a vibe akin to classic Sonic games.

Links mentioned:

Lecture 15: CUTLASS: no description found
Spin Cycle: Skybreak, BilliumMoto, Miyolophone · Song · 2023

CUDA MODE ▷ #hqq (6 messages): 

BitBLAS Unveiled by Microsoft: Microsoft's BitBLAS library, designed to facilitate mixed-precision matrix multiplications essential for quantized LLM deployment, has been shared in the chat.
Tech Enthusiasts Chat About TVM: Channel members discuss the usage of TVM as the backend for the newly mentioned BitBLAS library, regarding it as an interesting choice.
On-Device Inference Insights: One member expresses a missed opportunity to try TVM in the past while focusing on on-device inference.
Exploration of Mixed-Precision Operations: There's anticipation for testing out the triton i4 / fp16 fused gemm, which has not been done yet due to time constraints.
HQQ Integration with HF's Transformers: Work on integrating HQQ with Hugging Face's transformers has been taking precedence, with plans to explore the BitBlas 2-bit kernel soon.

Link mentioned: GitHub - microsoft/BitBLAS: BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.: BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment. - microsoft/BitBLAS

CUDA MODE ▷ #llmdotc (331 messages🔥🔥): 

FP8 vs BF16 Performance: An attempt to get FP8 running roughly yielded a 29.5ms performance compared to 43ms for BF16 and 80ms for FP32, highlighting the potential benefits of further precision reduction. Amdahl's Law is mentioned as a limiting factor to performance gains.
High Optimism for Mixed-Precision Kernels: Discussion about using mixed-precision kernels revealed a concern about getting FP8 to work properly, and a comparison to BF16 showed significant improvements, though the matmuls remained in BF16. The conversation includes mentions of strategies for model merging tactics and hardware-specific optimizations.
Towards Deterministic Losses Across Batch Sizes: Discovering loss value inconsistencies when batch sizes varied prompted suggestions to look at CUBLAS_PEDANTIC_MATH settings and debug by dumping intermediary activations. Numerical inconsistencies may not be related to the batch size issue causing crashes.
Potential Educational Value in CUDA Kernels: Discussions about CUDA matrices and custom attention kernels (not using tensor cores) indicate potential for these implementations to be valuable educational materials on CUDA optimization. A particular focus might be placed on kernels that can improve readability and ease of learning, especially within versions of the code that target FP32 paths.
Plans for CUDA Courses and Projects: Proposals to include llm.c as course material or project subject in university courses have been put forward, suggesting the project could serve as a practical and advanced learning platform for parallel programming students. Courses might employ an input-benchmark-feedback mechanism, potentially extending to a broader set of CUDA-based optimization problems.

Links mentioned:

Twitch: no description found
torch: Tensors and Dynamic neural networks in Python with strong GPU acceleration
8-bit Optimizers via Block-wise Quantization: Stateful optimizers maintain gradient statistics over time, e.g., the exponentially smoothed sum (SGD with momentum) or squared sum (Adam) of past gradient values. This state can be used to accelerate...
How to Optimize a CUDA Matmul Kernel for cuBLAS-like Performance: a Worklog: In this post, I’ll iteratively optimize an implementation of matrix multiplication written in CUDA.My goal is not to build a cuBLAS replacement, but to deepl...
Different outputs when using different batch size (only on cuda): I boiled down my issue to a very simple example. This network produces different values (by a small decimal) based on the batch size. Note that the values remain consistent regardless of batch size wh...
GitHub - KernelTuner/kernel_float: CUDA header-only library for working with vector types (half2, float4, double2) and reduced precision math (half, e5m2)  inside kernel code: CUDA header-only library for working with vector types (half2, float4, double2) and reduced precision math (half, e5m2)  inside kernel code - KernelTuner/kernel_float
Fix build errors by adding compute capability flags to the makefile by PeterZhizhin · Pull Request #235 · karpathy/llm.c: This fixes the build errors when trying to compile new half-precision kernels. The new train/test/profile requires compute capability >8.0 (Ampere)
feat(attention_forward.cu): Gentle introduction to CuTe(cutlass) by FeSens · Pull Request #233 · karpathy/llm.c: This is a very, very gentle introduction to Flash Attention 2 with CuTe (Cutlass v3). It's gentle because it's not finished. What I've got so far:  Work partitioned between Query block, Ba...
llm.c/dev/cuda/encoder_backward.cu at master · karpathy/llm.c: LLM training in simple, raw C/CUDA. Contribute to karpathy/llm.c development by creating an account on GitHub.
Second matmul for fully custom attention by ngc92 · Pull Request #227 · karpathy/llm.c: So far, just in the /dev  files, because for the main script we also need to touch backward. For some reason, I see  considerable speed-up in the benchmarks here, but in my attempts to use this in ...
LLM9a: CPU optimization: no description found
Courses: no description found

CUDA MODE ▷ #massively-parallel-crew (4 messages): 

Guest Speaker Invite Considered: A member suggested inviting @tri_dao on Twitter to give a talk. The idea is welcomed, with the hope of discussing kernel code and optimizations.

Clarification of Presentation Content: The same member clarified that @tri_dao could present on any topic he prefers, with a hint of interest in flash decoding due to its scarce documentation.

Eleuther ▷ #general (6 messages): 

New Open Source Generative Image Model Arena Launched: The release of a new open source project dubbed ImgSys was announced, showcasing a generative image model arena at imgsys.org. The preference data for this project can be explored further on Hugging Face at huggingface.co/datasets/fal-ai/imgsys-results.

Chain-of-Thought Prompting Leaderboard Unveiled by Hugging Face: Hugging Face's latest post spotlights the Open CoT Leaderboard, tracking large language models' (LLMs) effectiveness with chain-of-thought (CoT) prompting. The leaderboard emphasizes accuracy gains derived from CoT approaches, valuing the enhanced reasoning capabilities in model solutions.

Assessment of CoT Approaches in Recent Research: The conversation indicates a strong focus on CoT prompting techniques and their applications in reasoning-based tasks. One user found the concentration on the GSM8K dataset on the CoT Leaderboard slightly disappointing due to its limitation to single-answer questions.

Mention of Counterfactual Reasoning: A member briefly referenced counterfactual reasoning, indicating an interest in this area of problem-solving within the community.

Reasoning Research as a High Priority Area: Discourse revealed a consensus that reasoning, particularly explored through CoT and related problem-solving frameworks, is a highly active and valued area in recent AI research.

Links mentioned:

Introducing the Open Chain of Thought Leaderboard: no description found
imgsys.org | an image model arena by fal.ai: no description found
fal-ai/imgsys-results · Datasets at Hugging Face: no description found

Eleuther ▷ #research (189 messages🔥🔥): 

Decoding LLMs - Task-Dependent Performance: A paper on decoding methods for LLMs tackles the challenge of instilling behavioral principles without preference labels or demonstrations. SAMI, a new iterative algorithm, effectively finetunes models to align with desired principles, improving performance across tasks.

Efficient Diffusion Models with Align Your Steps: NVIDIA's research introduces Align Your Steps, optimizing the sampling schedules for Diffusion Models (DMs) to improve sampling speed while maintaining high-quality outputs—evaluated across various solvers and datasets.

Facebook's 1.5 trillion parameter Recommender System: Facebook's new paper details a novel architecture dubbed HSTU, which has been deployed on their platforms, showing 12.4% improvement in metrics over previous systems, alongside specific CUDA kernels that handle varying context lengths.

Economic Approach to Generative AI Copyright Issues: A new paper advocates an economic model to address copyright concerns with generative AI systems. It leverages cooperative game theory to determine equitable compensation for training data contributors.

Challenges of Privacy with Generative AI: The release of a research paper draws attention to the feasibility of extracting substantial amounts of training data from models like ChatGPT, signaling significant vulnerabilities that question the effectiveness of just aligning AI not to reproduce training data verbatim.

Links mentioned:

SpaceByte: Towards Deleting Tokenization from Large Language Modeling: Tokenization is widely used in large language models because it significantly improves performance. However, tokenization imposes several disadvantages, such as performance biases, increased adversari...
Extracting Training Data from ChatGPT: no description found
Self-Supervised Alignment with Mutual Information: Learning to Follow Principles without Preference Labels: When prompting a language model (LM), users frequently expect the model to adhere to a set of behavioral principles across diverse tasks, such as producing insightful content while avoiding harmful or...
An Economic Solution to Copyright Challenges of Generative AI: Generative artificial intelligence (AI) systems are trained on large data corpora to generate new pieces of text, images, videos, and other media. There is growing concern that such systems may infrin...
MambaByte: Token-free Selective State Space Model: Token-free language models learn directly from raw bytes and remove the inductive bias of subword tokenization. Operating on bytes, however, results in significantly longer sequences. In this setting,...
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models: Diffusion models (DMs) have established themselves as the state-of-the-art generative modeling approach in the visual domain and beyond. A crucial drawback of DMs is their slow sampling speed, relying...
Align Your Steps: Align Your Steps: Optimizing Sampling Schedules in Diffusion Models
Profluent: We are fluent in the language of protein design.
Simple linear attention language models balance the recall-throughput tradeoff: Recent work has shown that attention-based language models excel at recall, the ability to ground generations in tokens previously seen in context. However, the efficiency of attention-based models is...
Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations: Large-scale recommendation systems are characterized by their reliance on high cardinality, heterogeneous features and the need to handle tens of billions of user actions on a daily basis. Despite bei...
Zoology (Blogpost 2): Simple, Input-Dependent, and Sub-Quadratic Sequence Mixers: no description found
A Thorough Examination of Decoding Methods in the Era of LLMs: Decoding methods play an indispensable role in converting language models from next-token predictors into practical task solvers. Prior research on decoding methods, primarily focusing on task-specifi...
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry: Linear attentions have shown potential for improving Transformer efficiency, reducing attention's quadratic complexity to linear in sequence length. This holds exciting promise for (1) training li...

Eleuther ▷ #scaling-laws (56 messages🔥🔥): 

Engaging Discussion on Scaling Curve Fit: Members debated the fitting approach for scaling curves, emphasizing that the original estimate may remain superior even with an expanded dataset of 400+ points. They scrutinized whether residuals around zero imply a better fit, questioning if omitted data during SVG conversion changed the distribution. 

Parsing SVG Data Points: A detailed exchange unfolded over how data points were extracted from SVG-converted figures. One member mentioned conducting experiments with matplotlib, discovering that points apparently omitted in a PDF are likely ones outside of the plotted frame, rather than those obscured by visual overlap.

Curve Fitting Challenges Clarified: Participants examined the potential mismatch in data distributions due to excluded data, considering the implications on residual analysis. It was noted that the remaining unobserved data might be treated as a different distribution, potentially affecting scaling estimates.

Scaling Analysis Critique: Criticism was raised about the inclusion of small models in the original analysis, arguing that models below 200M parameters should be excluded due to the disproportionate influence of embedding parameters at smaller scales.

Critique of Residual Distribution Interpretation: A member corrected an earlier statement about a residual distribution chart, observing that the distribution appeared to be centered but too thin-tailed to conform to a normal distribution, challenging the interpretation of a Chinchilla paper's confidence interval.

Link mentioned: Proving Convergence of Least Squares Regression with i.i.d. Gaussian Noise: I have a basic question that I can't seem to find an answer for -- perhaps I'm not wording it correctly. Suppose that we have an $n$-by-$d$ matrix, $X$ that represents input features, and we...

Eleuther ▷ #interpretability-general (4 messages): 

Exponential Growth in Residual Streams: A LessWrong post analyzes how, across various language models, the norm of each residual stream grows exponentially with each layer, attributing this to LayerNorm's tendency to overshadow rather than cancel out existing features.
Interest in Norm Growth Phenomenon: The analysis noting that pre-layernorm makes it hard to delete information from the residual stream, leading to an increase in norm with increasing layer, was highlighted as "rly fascinating" and an important factor to consider in model behavior.

Links mentioned:

Residual stream norms grow exponentially over the forward pass — LessWrong: Summary: For a range of language models and a range of input prompts, the norm of each residual stream grows exponentially over the forward pass, wit…
Residual stream norms grow exponentially over the forward pass — LessWrong: Summary: For a range of language models and a range of input prompts, the norm of each residual stream grows exponentially over the forward pass, wit…

Eleuther ▷ #lm-thunderdome (12 messages🔥): 

Token Initiation Inquiry: A user queried whether eval-harness includes the beginning of sequence token by default.
New Task Proposal for MMLU: A member proposed submitting a PR for a task implementation of MMLU using the arc prompt format and is curious if there would be any interest in this experimental format.
Call for a Unified MCQA Implementation: In response to a discussion about a specialized task format, a user expressed a preference for a generic system supporting different styles (like "arc style" or "MMLU style") for all MCQA tasks to maintain a unified implementation.
Custom Metric Parallelization Query: An inquiry was made about running a metric from lm-evaluation-harness in parallel and to write a custom metric that can leverage the OpenAI API for evaluation.
Custom Task Evaluation with Perplexity Issues: A user faced challenges evaluating a custom task using CrossEntropyLoss as a measure and selected perplexity as a metric, but encountered problems with extremely high values and overflow issues. Another participant agreed to look into improving the use of perplexity with loglikelihood / multiple choice tasks, indicating the problem might be related to the incorrect token count used to compute perplexity.

Eleuther ▷ #gpt-neox-dev (50 messages🔥): 

Advancing RWKV Integration in NeoX: Discussions revolve around integrating RWKV into GPT-NeoX, with necessary updates and improvements. Commits and ongoing issues are referenced, such as Issue #1167, and PR #1198 which adds RWKV support, while pointing out the need for JIT compilation, fp16 support, pipeline parallelism, and model compositionality.

Update on RWKV's Versioning and Portability: The conversation covers the importance of version numbering for RWKV, the potential use of Triton kernels for version 6.0, and ensuring support for AMD with ROCm or HIP. The upcoming PyTorch 2.3 release is mentioned as a potential solution for compiling Triton code.

Tokenization Troubles in Pretraining Data: A member highlights issues with tokenizer versions changing the way space tokens are split, specifically changes between huggingface tokenizers versions 0.13 and 0.14. Concerns are raised about the consistency of pre-tokenized training data with current tokenizer outputs, and the silence of tokenizers on their breaking changes is criticized.

Tokenization And Version Management Frustrations: Members express their frustrations with tokenizer inconsistencies and the difficulty of managing binary dependencies and versions, citing failed attempts to migrate package management for NeoX to poetry due to these challenges.

Tackling Complexities of Token Merging: A detailed discussion on how to handle token merging and preprocessing discrepancies unfolds, with an understanding that current mismatches might arise from preprocessing steps and that tie-breaking issues in the tokenizer could be the root of some problems.

Links mentioned:

GitHub - RWKV/RWKV-infctx-trainer at rwkv-6-support: RWKV infctx trainer, for training arbitary context sizes, to 10k and beyond! - GitHub - RWKV/RWKV-infctx-trainer at rwkv-6-support
Add Basic RWKV Block to GPT-NeoX · Issue #1167 · EleutherAI/gpt-neox: We want to add RWKV to gpt-neox: Add basic RWKV block, without kernels, from https://github.com/BlinkDL/RWKV-LM to https://github.com/EleutherAI/gpt-neox/tree/main/megatron/model Add rwkv kernels A...
GitHub: Let’s build from here: GitHub is where over 100 million developers shape the future of software, together. Contribute to the open source community, manage your Git repositories, review code like a pro, track bugs and fea...
Comparing main...rwkv-6-support · RWKV/RWKV-infctx-trainer: RWKV infctx trainer, for training arbitary context sizes, to 10k and beyond! - Comparing main...rwkv-6-support · RWKV/RWKV-infctx-trainer
GitHub - SmerkyG/gpt-neox at rwkv: An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library. - GitHub - SmerkyG/gpt-neox at rwkv
flash-linear-attention/fla/ops/rwkv6/chunk.py at main · sustcsonglin/flash-linear-attention: Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton - sustcsonglin/flash-linear-attention
add rwkv support by jahatef · Pull Request #1198 · EleutherAI/gpt-neox: This is ready for review.
add rwkv support by jahatef · Pull Request #1198 · EleutherAI/gpt-neox: This is ready for review.
[AMD] Supporting fused kernels build using JIT by R0n12 · Pull Request #1188 · EleutherAI/gpt-neox: This PR aims to enable JIT compilation for fused kernels on AMD GPUs, so that the same code will run on both AMD and NVIDIA GPUs. Previous python setup.py install method has issues with hipifying f...
add rwkv support by jahatef · Pull Request #1198 · EleutherAI/gpt-neox: This is ready for review.

Stability.ai (Stable Diffusion) ▷ #general-chat (311 messages🔥🔥): 

Choosing the Right Model for Realism: For generating photo-realistic portrait images using Forge UI, models like Juggernaut X and EpicrealismXL are recommended. Users share frustration with Juggernaut's complexity in prompting but report success with other models like RealVis V4.0 for better results with less effort.

Forge UI Versus A1111: Users discuss the efficiency of the Forge UI compared to A1111, noting that Forge is more memory-efficient and is suitable for GPUs with less VRAM. Despite A1111 causing significant lag due to high RAM usage, some users prefer it over Forge, which may experience memory leaks that currently remain under investigation.

Model and Lora Merging Tactics: To achieve consistent model outputs, users suggest combining models with Lora training or dream booth training to pinpoint specific styles or objects. One approach involves prioritizing the body in creation and using techniques like inpaint to correct facial details when merging two different model outputs.

Anticipation for SD3 Release and Usage: Excitement and impatience are voiced regarding the awaited release of Stable Diffusion 3.0 (SD3). Users report that SD3 can currently be used via an API with limited free credits, while others speculate about the potential costs and licensing for full access.

Improving Stable Diffusion Outputs: For issues like blurry images, users suggest generating at higher resolutions such as 1024x1024 with SDXL models in Forge. Inquiries about fine-tuning with Kohya_SS indicate that users in the community may need guidance, and both full finetunes and smaller adjustments like Lora training are discussed.

Links mentioned:

glif - StableDiffusion 3 by fab1an: no description found
Membership — Stability AI: The Stability AI Membership offers flexibility for your generative AI needs by combining our range of state-of-the-art open models with self-hosting benefits.
Tweet from Christian Laforte (@chrlaf): @rajdhakad_ @USEnglish215753 @StabilityAI @EMostaque Our plan is to soon release the API first to collect more human preference data and validate that our safety improvements don't cause the quali...
GitHub - Snowflake-Labs/snowflake-arctic: Contribute to Snowflake-Labs/snowflake-arctic development by creating an account on GitHub.
Reddit - Dive into anything: no description found
Try This FREE AI Video (30-Second AI FILMS With One Prompt): Try it out here today:https://noisee.ai/📧Join my newsletterhttps://delightfuldesign.eo.page/w7tf5---👨‍🏫Check out my AI courses: https://www.udemy.com/user...
Reddit - Dive into anything: no description found
SOCIAL MEDIA TITLE TAG: SOCIAL MEDIA DESCRIPTION TAG TAG
GitHub - megvii-research/HiDiffusion: Contribute to megvii-research/HiDiffusion development by creating an account on GitHub.
no title found: no description found

HuggingFace ▷ #announcements (1 messages): 

Llama 3 Launches with a Bang: The new Llama 3 model has been trained on 15 trillion tokens and fine-tuned on 10 million human annotated samples. Boasting 8B and 70B versions, this model surpasses all open LLMs on the MMLU benchmark with the 70B version scoring over 80, and it features a Tiktoken-based tokenizer, SFT, PPO, DPO alignments, and is available for commercial use. Check out the demo and blog post.

Phi-3 Released with MIT Licensing: Phi-3 offers two Instruct versions featuring context windows of 4k and 128k. This model, trained on 3.3 trillion tokens and fine-tuned with SFT & DPO, also includes "function_call" special tokens and is ready to be deployed on Android and iPhones. Get started with the demo and explore finetuning on AutoTrain.

Open Source Highlights: FineWeb dataset is now open-sourced with 15 trillion tokens of web data, Gradio updated to 4.27.0, Sentence Transformers receives a v2.7.0 update, LlamaDuo scripts released for language improvement synergy, and The Cauldron dataset is launched for vision-language task fine-tuning with a collection of 50 datasets. Explore these resources starting with the FineWeb and Sentence Transformers update.

HuggingChat Goes iOS: The HuggingChat app is now available on iOS, bringing chatbot capabilities to mobile users. See the announcement and details here.

New Content for AI Enthusiasts: Hugging Face introduces the concept of multi-purpose transformer agents in a new blog post, hosts a HuggingCast to teach deployment on Google Cloud, and unveils the Open Chain of Thought Leaderboard. Gain more insights from Jack of All Trades, Master of Some and the Google Cloud deployment session.

HuggingFace ▷ #general (211 messages🔥🔥): 

OCR Tools Beyond Tesseract: Members discussed alternatives to Tesseract for OCR, suggesting solutions such as PaddleOCR and keras for reading float numbers and self-hosting requirements. Pre-processing was mentioned as a critical factor in improving OCR results with Tesseract.

HuggingChat API Inquiry and Issues with Inference Endpoints: Members sought information on using HuggingChat's API with curl for remote purposes, leading to discussions about the use of huggingface_cli. Concerns were raised regarding the downtime of assistance models and inference endpoints.

Model Training and Preprocessing Tactics: One member revealed a comprehensive approach for fine-tuning Mistral 8x22 on OpenHermes 2.5, including details on the optimizer and learning rate settings.

Stable Diffusion Setup Struggles: Members shared frustrations and guidance for setting up stable diffusion, including the WebUI and torch, seeking help with specific error messages and installation guides.

Community Activity and Help Requests: The chat included proposals for organizing a game night, technical assistance requests for virtual environments in Python for LLM use on websites, as well as conversations about newly available models like Snowflake's hybrid Dense+MoE versions.

Links mentioned:

Pretraining on the Test Set Is All You Need: Inspired by recent work demonstrating the promise of smaller Transformer-based language models pretrained on carefully curated data, we supercharge such approaches by investing heavily in curating a n...
Snowflake Arctic St Demo - a Hugging Face Space by Snowflake: no description found
Llama 3-70B - HuggingChat: Use the Llama 3-70B assistant inside of HuggingChat
Dinela GIF - Dinela - Discover & Share GIFs: Click to view the GIF
Hi Hello GIF - Hi Hello Greeting - Discover & Share GIFs: Click to view the GIF
meta-llama/Meta-Llama-3-70B-Instruct · Hugging Face: no description found
Google Colaboratory: no description found
Albert Einstein - HuggingChat: Use the Albert Einstein assistant inside of HuggingChat
Tweet from abhishek (@abhi1thakur): Phi-3 is here!!!! 🚀 and ofcourse, you can already fine-tune it using AutoTrain 🚀🚀🚀
Snowflake/snowflake-arctic-instruct · Hugging Face: no description found
Snowflake/snowflake-arctic-base · Hugging Face: no description found
AdaptLLM/medicine-chat · Hugging Face: no description found
Do any external graphics cards (eGPUs) work with an M1 Mac, and if not -- why?: According to multiple listings on eGPU chassis, like this one, M1 Macbooks are not supported.
I have two questions, are any supported? And if not, why? Is this a limitation of the software that co...
Meta Llama 3 | 8B API Documentation (swift-api-swift-api-default) | RapidAPI: no description found
Join the Support Ticket Discord Server!: Check out the Support Ticket community on Discord - hang out with 1114 other members and enjoy free voice and text chat.

HuggingFace ▷ #today-im-learning (5 messages): 

Rust Gets Cozy with Candle: A member highlighted that Rust can be used with the HuggingFace/Candle framework, suggesting broader language compatibility within HuggingFace's tools.

LangChain for Efficient Memory Storage: A chatbot developer shared an implementation of LangChain service that stores memorized facts as plain text. This approach is inspired by a YouTube video on building agents with long-term memory, aiming to save tokens and avoid unnecessary function calls.

Knowledge Transfer Without Embeddings: The same developer pointed out that the plain text knowledge storage allows for the knowledge to be easily transferred between agents, without the need for embeddings. This facilitates the replication or moving of distilled knowledge across different models or agents.

Clarity on Rust and ONNX: In response to a question about model conversion to ONNX format when using JavaScript with machine learning models, another member clarified they are not familiar with ONNX. Discussions on ONNX should be directed to a different channel where more knowledgeable members can provide insights.

Link mentioned: Build an Agent with Long-Term, Personalized Memory: This video explores how to store conversational memory similar to ChatGPT's new long-term memory feature.We'll use LangGraph to build a simple memory-managin...

HuggingFace ▷ #cool-finds (17 messages🔥): 

Transformers.js Brings ML to Browsers: Transformers.js is now live, allowing HuggingFace Transformers to run directly in the browser. This could revolutionize accessibility, opening up possibilities for machine learning on client-side applications.
Revival of an Older AI Paper: A discussion emerged regarding the "Retentive Network: A Successor to Transformer for Large Language Models" paper. Feedback suggests that while it showed promise, it may now underperform compared to current architectures like RWKV and Mamba.
Quantization Boosts Inference Speed: The impact of 4-bit and 8-bit quantization is outlined in a paper, where 4-bit leads to 95.0% accuracy with a 48% speedup, and 8-bit yields a slightly better accuracy of 95.4% with a 39% speedup.
Bytedance Joins HuggingFace: Bytedance, the parent company of TikTok, introduces Hyper-SD models on HuggingFace, enhancing sdxl level generation capabilities.
Introduction to HuggingFace for Beginners: A link to DataCamp's tutorial was shared which offers a beginner-friendly explanation of Transformers and their applications in NLP, hoping to bridge the knowledge gap for newcomers.

Links mentioned:

Twitch: no description found
Micode - Twitch: 🥨 Underscore_ le talk-show des passionnés de l'IT, 1 mercredi sur 2, à 19h. Avec Micode, Matthieu Lambda & Tiffany Souterre
Transformers.js: no description found
Deep Reinforcement Learning with a Natural Language Action Space: This paper introduces a novel architecture for reinforcement learning with deep neural networks designed to handle state and action spaces characterized by natural language, as found in text-based gam...
An Introduction to Using Transformers and Hugging Face: In this Hugging Face tutorial, understand Transformers and harness their power to solve real-life problems.
ByteDance (ByteDance): no description found

HuggingFace ▷ #i-made-this (11 messages🔥): 

Collaboration Invitation by Manifold Research Group: Sidh from Manifold Research Group extends an invite for a community research call to discuss project updates and collaboration opportunities. The call is announced on Twitter.

Introducing NorskGPT-8b-Llama3: Bineric AI in Norway released a tri-lingual large language model, optimized for dialogue use cases, trained on NVIDIA RTX A6000 GPUs. They invite the community to download and test the model from Hugging Face and share feedback, also posting about it on LinkedIn.

Achieving 1K Readers on Hugging Face: A member celebrates surpassing 1K readers on the Hugging Face platform, encouraging the community to upvote their blog if they found it interesting.

Learning through Contribution: A new joiner expresses their enthusiasm to start contributing to the community and learning, despite making errors in their Python code.

Project Exposure via Product Hunt: Muhammedashiq shares their project Wizad on Product Hunt, requesting support from the community with upvotes.

Links mentioned:

LIPSICK - a Hugging Face Space by Inferencer: no description found
Bark (with user-supplied voices) - a Hugging Face Space by clinteroni: no description found
bineric/NorskGPT-Llama3-8b · Hugging Face: no description found
 Wizad - Social media posters in one click with GenAI | Product Hunt: Wizad is your go-to app for effortlessly creating stunning social media posters that perfectly match your brand's identity. Say goodbye to the hassle of hiring designers or spending hours tweakin...

HuggingFace ▷ #computer-vision (4 messages): 

OCR for Text Extraction: A member recommended using OCR (specifically tesseract) to extract text from scanned images, which can then be processed or corrected by a language model.

Combining Conversational LLM and Live Visual Data: An individual is working on a project to have a conversational LLM interpret live visual inputs from a webcam and screen capture. They are facing challenges with Llava's hallucination and are considering a flow to pass questions from the chat LLM back to Llava for more accurate descriptions.

Solid Pods Mentioned as a Solution: In response to an unspecified query, a member mentioned that Solid pods might be the answer, suggesting a potential solution for a problem that was not detailed in the conversation.

Acknowledgement of Assistance: Another member expressed gratitude with a simple thanks to a fellow group member, though the context of the help was not provided.

HuggingFace ▷ #NLP (1 messages): 

Parallel Interaction with Transformers: A member inquired about the possibility of interacting with a Large Language Model (LLM) in parallel, specifically sending two requests simultaneously rather than sequentially. There was no follow-up discussion or solutions provided for this query within the provided messages.

HuggingFace ▷ #diffusion-discussions (11 messages🔥): 

DiffusionPipeline Loading Issue Reported: A user is having difficulty loading a model with DiffusionPipeline.from_pretrained("haoningwu/StoryGen") due to a problem with the config json.
Collaboration Inquiry on Diffusion Issues: In response to the loading issue, another user tagged an individual who might help sort out the problem with the DiffusionPipeline.
AI Horse Project Feasibility Query: A member asked if it's possible to create a 1-minute video on "AI Horse" using Diffusion, indicating that the project is a compulsory assignment.
Concerns About Hyper-SD and IP-Adapter Outputs: A user reports that using Hyper-SD with IP-Adapter produces very cartoonish images, contrasting with realistic results from using LCM + IPA and seeks community advice on realistic outputs.
Link to IP-Adapter Community Shared: A hyperlink to an IP-Adapter Discord community was shared in response to concerns about achieving realistic image outputs, suggesting further resource and engagement at Matteo's ComfyUI IPAdapter community.
Inquiry on Utilizing a Fine-Tuned TTS Model: A user is seeking assistance on how to implement a fine-tuned Text to Speech model stored in a .bin file within diffusers, pondering whether to use it as a custom model.

Link mentioned: : Discord is the easiest way to talk over voice, video, and text. Talk, chat, hang out, and stay close with your friends and communities.

LAION ▷ #general (221 messages🔥🔥): 

Curiosity About MagVit2's Progress: A user inquired about the practical usability of the magvit2-pytorch repository for training tasks, wondering if the code could replicate scores from the original paper and commented on the repo’s last update being 3 months ago.
Training Small Text-Image Diffusion Model Strategies: Users discussed the merits of using simple datasets for training minimal text-image diffusion models, suggesting reductions in hyperparameters for speed, and considering the CUB-200-2011 dataset due to its image descriptions for a more focused scope.
Comparing Text Encoders in Diffusion Models: There was a debate on the best text encoder for training, comparing models like T5, Flan T5, and Pile-T5, and considering newer variants like ByT5 and 'google/t5-v1_1-base' model from Hugging Face. This extended into a conversation about the challenges and potential strategies when training generative models and the cost of training at scale.
Firefly Image 3 Foundation Model Announced by Adobe: Adobe unveiled its latest generative AI model, Adobe Firefly Image 3 Foundation Model, promising advancements in quality and control for creative endeavors, now available in Photoshop (beta) and on a dedicated website.
Manipulating Midjourney's Ratings: A user reported how easily they manipulated Midjourney’s image ratings with a script before notifying the team about the vulnerability. A conversation followed about security in generative AI platforms and how such loopholes might be exploited or addressed.

Links mentioned:

Adobe Introduces Firefly Image 3 Foundation Model to Take Creative Exploration and Ideation to New Heights: no description found
GLIBC Vulnerability on Servers Serving PHP | Rocky Linux: Rocky Linux is an open enterprise Operating System designed to be 100% bug-for-bug compatible with Enterprise Linux.
Snowflake/snowflake-arctic-instruct · Hugging Face: no description found
VideoGigaGAN: no description found
imgsys.org | an image model arena by fal.ai: no description found
Papers with Code - CUB-200-2011 Dataset: The Caltech-UCSD Birds-200-2011 (CUB-200-2011) dataset is the most widely-used dataset for fine-grained visual categorization task. It contains 11,788 images of 200 subcategories belonging to birds, 5...
How To Build Generative AI Models Like OpenAI's Sora: If you read articles about companies like OpenAI and Anthropic training foundation models, it would be natural to assume that if you don’t have a billion dol...
fal-ai/imgsys-results · Datasets at Hugging Face: no description found

LAION ▷ #research (19 messages🔥): 

SEED-X Aims to Close the Multimodal Foundation Model Gap: The introduction of SEED-X aims to enhance the real-world applicability of multimodal foundation models by improving image understanding and generation. It introduces capabilities like processing images of arbitrary sizes and ratios and enabling multi-granularity image generation.

HiDiffusion Boosts Diffusion Models in One Line: HiDiffusion promises to increase the resolution and speed of preexisting diffusion models with "only adding a single line of code," prompting excitement and discussions about its real-world effectiveness. The project can be explored further on its dedicated page and the GitHub repository.

Skepticism Over "One Line of Code" Claims: A skeptical remark was made regarding claims of significant improvements being achievable with just "one line of code," suggesting that this often doesn't hold true in practice.

Apple Unveils CoreNet: A link to Apple's CoreNet on GitHub was shared, which appears to be related to CLIP-level visual recognition with faster pre-training on image-text data. No additional details were provided within the messages.

Multi-Head Mixture-of-Experts (MH-MoE) Improves Model Activation: A new approach called MH-MoE addresses issues in Sparse Mixtures of Experts (SMoE) models by increasing expert activation and offering more nuanced analytical capabilities for semantic concepts. The method borrows from multi-head mechanisms to achieve more effective token processing and is detailed in a recent research paper.

Links mentioned:

Multi-Head Mixture-of-Experts: Sparse Mixtures of Experts (SMoE) scales model capacity without significant increases in training and inference costs, but exhibits the following two issues: (1) Low expert activation, where only a sm...
SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation: The rapid evolution of multimodal foundation model has demonstrated significant progresses in vision-language understanding and generation, e.g., our previous work SEED-LLaMA. However, there remains a...
SOCIAL MEDIA TITLE TAG: SOCIAL MEDIA DESCRIPTION TAG TAG
GitHub - megvii-research/HiDiffusion: Contribute to megvii-research/HiDiffusion development by creating an account on GitHub.

OpenRouter (Alex Atallah) ▷ #announcements (2 messages): 

MythoMax 13B Glitch Resolved: The bad responses issue with MythoMax 13B has been identified and mitigated by the top provider. Users are encouraged to try again and report if issues persist, with a discussion thread available for feedback.

504 Errors Spike Amidst Networking Troubles: A spike in 504 errors was observed due to networking issues in the central and west US regions, specifically affecting Llama 2 tokenizer models. Root cause fix is in progress.

Hugging Face Downtime Disrupts Service: The 504 errors are linked to Hugging Face downtime; a fix, which involves removing this dependency, was announced to be live shortly.

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages): 

Deepgaze Launches with GPT-4V Integration: Introducing Deepgaze, the service that allows feeding a variety of documents into GPT-4V with just one line of code, targeting tasks like automating jobs or writing research papers from multiple PDFs in different languages. Two potential leads for Deepgaze have been identified through a Discord bot that monitors Reddit: someone needing to write a research paper from PDFs in multiple languages and someone seeking to automate job activities by reading data from various sources.

Potential Lead in Multilingual Research: A Reddit user needs help in composing a research paper from resources in several languages, which Deepgaze could facilitate by extracting data from complex files like PDFs. The discussion of this requirement can be found on the subreddit ArtificialInteligence.

Automation Enthusiast could Benefit from Deepgaze: Another Reddit user's quest for automating their job might be addressed by Deepgaze's ability to process and interpret data from diverse sources. This user's situation was pointed out approximately 1 hour ago on Reddit's ArtificialInteligence community.

Link mentioned: DeepGaze: no description found

OpenRouter (Alex Atallah) ▷ #general (203 messages🔥🔥): 

Enthusiasm for OpenRouter's Wizard Model: Users express excitement about OpenRouter's Wizard model, noting that it is impressive when prompted correctly and eagerly anticipate future model improvements.

Issues with Model Prompting and Parameters: A user inquired about whether Llama 3 supports json mode, with follow-ups indicating currently no Llama 3 providers support it. The discussion also covered how to identify if a provider supports logit_bias, and confusion around Mistral Large's handling of system prompts.

Fireworks AI's Efficient Model Serving: Users discussed how providers like Fireworks AI keep costs low, speculating on the use of FP8 quantization to serve models more efficiently. Concerns were raised about tokenomics and comparison with cryptocurrency mining. A link to Fireworks' blog post detailing their efficient serving methods for models: Fire Attention — Serving Open Source Models 4x faster than vLLM by quantizing with no tradeoffs.

Release of Microsoft's Phi-3 Mini Model: Microsoft released Phi-3 Mini Model variants with 4K and 128K contexts with unrestricted use under an Apache 2.0 license. Some users grabbed the weights quickly, while others hoped for its addition to OpenRouter and discussed its unique architecture: Arctic Introduction on Snowflake.

Troubleshooting OpenRouter Issues and Model Performance: Users reported technical issues with OpenRouter, asking for assistance and detailing errors like a rate_limit_error. OpenRouter staff provided responses and hotfixes, indicating a dependency on Hugging Face was the root cause of some troubles but should not recur following removal. Users also debated the performance of various language models, including Google's Gemini 1.5 and the potential inefficiency of the MMLU benchmark.

Links mentioned:

Groq Inference Tokenomics: Speed, But At What Cost?: Faster than Nvidia? Dissecting the economics
Snowflake/snowflake-arctic-instruct · Hugging Face: no description found
microsoft/Phi-3-mini-128k-instruct-onnx · Hugging Face: no description found
FireAttention — Serving Open Source Models 4x faster than vLLM by quantizing with ~no tradeoffs: Serving Open Source Models 4x faster than vLLM by quantizing with ~no tradeoffs
microsoft/Phi-3-mini-4k-instruct · Hugging Face: no description found
gist:a89ad8522cc01fb409f229f186216773: GitHub Gist: instantly share code, notes, and snippets.
OpenRouter: A router for LLMs and other AI models
Claude 3 "Vision" uses Google's Cloud Vision API: # this page is a work-in-progress; I have a lot of data to go through. ~85% sure about the current conclusion. Anthropic's Claude 3 family provides their models with Vision capability, allowing it to ...
OpenRouter: Build model-agnostic AI apps
OpenRouter: Build model-agnostic AI apps
OpenRouter: Build model-agnostic AI apps

Modular (Mojo 🔥) ▷ #💬︱twitter (2 messages): 

Fresh Tweets from Modular: Modular shared tweets which can be viewed on their Twitter page. The content of the tweets has not been disclosed in the message. Check out the tweets.
Another Modular Update Hot Off the Press: A new update or piece of information was posted by Modular on Twitter, specifics of which were not mentioned directly in the chat. See the latest from Modular.

Modular (Mojo 🔥) ▷ #ai (2 messages): 

Contemplating the Future of AI and Consciousness: A participant expressed skepticism about current A.I. achieving artificial consciousness due to inefficiencies in power and data handling. They inquired whether advancements in hardware like quantum or tertiary computing could pave the way, or if software innovations alone might suffice.

Quantum Computing's Conundrum for AI: Another participant is dubious about the practicality of quantum computing for AI, stating it's a "random mess" that struggles with certainty in calculations, implying it's unsuitable for developing explainable AI.

Tertiary Computing: An Obscure Option: When discussing tertiary computing, a link to the Wikipedia page of Setun, an early ternary computer, was shared as an historical example, albeit with a note of limited knowledge about the topic from the contributor.

Government as a Potential Barrier: There's a belief that the government may attempt to hinder AI progress, especially in areas as unpredictable and uncharted as quantum computing for AI applications.

AGI's Path Not Just a Matter of Computation: It was suggested that the journey towards artificial general intelligence (AGI) is less about computational power and more reliant on the complexity and architecture of the AI systems.

Link mentioned: Setun - Wikipedia: no description found

Modular (Mojo 🔥) ▷ #🔥mojo (132 messages🔥🔥): 

QuickSort Implementation Shared: A member shares their quicksort algorithm for sorting structures based on Rust's example, which you can find on Joy of Mojo. The algorithm sorts using pointers and a compare function to determine order by age amongst a simulated group of people.
Understanding Pointers, References, and Traits: The channel discussion includes an exploration of type constraints in function templates, the use of Nullable pointers, and the distinction between Pointer and UnsafePointer. Traits for sorting (Sortable) and a Person struct are outlined, pointing towards a generic sorting function for any data type.
Nightly vs Stable Version Differences: Users discuss the difference in behavior between the nightly and stable versions of Mojo, noting inconsistent outcomes for pointers and strings, with a mention of segfault issues when using UnsafePointer on a string.
Special Functions for Pointer Initialization: Multiple posts address using special functions like __get_address_as_uninit_lvalue and initialize_pointee() to manage data assignment and avoid destructor problems with uninitialized data.
The Perils and Puns of Pointers: Conversation takes a light-hearted turn as members discuss segfaults, the "hackiness" of certain implementations, and puns around the use of pointers in code. A member provides helper functions to assist with ownership hijacking using UnsafePointer.

Links mentioned:

equality_comparable | Modular Docs: EqualityComparable
sort | Modular Docs: Implements sorting functions.
Sorting Techniques: Author, Andrew Dalke and Raymond Hettinger,. Python lists have a built-in list.sort() method that modifies the list in-place. There is also a sorted() built-in function that builds a new sorted lis...
Traits | Modular Docs: Define shared behavior for types.
Generic Quicksort: Context Mojo Reference: Sort Mojo Version: 24.2.1 Demo: Sorting a Group of People by Age This demo showcases how to sort a group of people based on their age using a versatile QuickSort algorithm. Thi...
mojo/stdlib/src/builtin/anytype.mojo at main · modularml/mojo: The Mojo Programming Language. Contribute to modularml/mojo development by creating an account on GitHub.
playground.mojo: GitHub Gist: instantly share code, notes, and snippets.
Python -c command line execution method - Programmer Sought: no description found
Parameterization: compile-time metaprogramming | Modular Docs: An introduction to parameters and compile-time metaprogramming.

Modular (Mojo 🔥) ▷ #community-projects (9 messages🔥): 

MoCodes Project Unveiled: An exciting new project called MoCodes, a stand-alone Error Correction (De)Coding framework, has been shared with the community. It harnesses the power of Mojo for compute-intensive tasks traditionally done on ASICs or FPGAs and is open for community input at MoCodes on GitHub.

Heap Allocation Monitoring with xcrun: For checking heap allocations, the command to use with xcrun is shared as xcrun xctrace record —template ‘Allocations’ —launch — ./path/to/binary/to/trace. A reminder is given to ensure the usage of double dashes due to potential phone formatting issues.

Additional Profiling Tools Mentioned: Beyond XCode, the samply utility is recommended as another useful tool for profiling, particularly as it does not require XCode.

User Acknowledges Profiling Challenge: A user acknowledges they have encountered difficulties spotting memory allocations in profiler results, possibly due to their own skills. This was in the context of using such tools for a challenge called 1brc.

Modular (Mojo 🔥) ▷ #community-blogs-vids (7 messages): 

Mojo 🔥 Celebrates Its First Anniversary at PyConDE: Jamie Coombes discusses Mojo, a programming language touted as "Python's faster cousin," at PyConDE in Berlin, covering its performance and potential as a Python superset. The talk examines Mojo's development journey and its place amidst competitors like Rust and Julia.

The Hype Around Mojo Discussing: Members of the Modular chatbot community reflect on the perception of Mojo within and outside their circle, particularly noting skepticism from the Rust community towards Mojo.

Leveling Up in Chatbot Community: ModularBot congratulated users for advancing to new levels within the community, marking their participation and engagement in the chatbot discussions.

Link mentioned: Tweet from Mojo 🔥 - Is it Python's faster cousin or just hype? PyConDE & PyData Berlin 2024: On 2023-05-02, the tech sphere buzzed with the release of Mojo 🔥, a new programming language developed by Chris Lattner, renowned for his work on Clang, LLVM, and Swift. Billed as "Python's...

Modular (Mojo 🔥) ▷ #performance-and-benchmarks (8 messages🔥): 

Random Float64 Function Performance Lags: Discussion revealed that the random.random_float64 function in Mojo is significantly slower than expected. A bug report was filed on the ModularML Mojo GitHub, and an alternative, MersenneTwister, was mentioned as a more efficient solution for random number generation.

Deliberation on Random Number Generator Variants: Proposal for having two versions of random number generators was seen: one emphasizing performance, and another being cryptographically secure with constant execution time, to serve different needs.

Puzzling Behavior in Return Value Optimization: A member tested Mojo's support for return value optimization (RVO), similar to C++, and noticed inconsistent behavior. They provided gist links showing different outcomes and queried whether this should be reported as an issue.

Link mentioned: [BUG] random.random_float64 is extremely slow · Issue #2388 · modularml/mojo: Bug description Generating one random number at a time in a for loop is extremely slow, almost 2 orders of magnitude slower than a numba-jitted equivalent. Context: I tried to use a simple Monte Ca...

Modular (Mojo 🔥) ▷ #📰︱newsletter (1 messages): 
Zapier: Modverse Weekly - Issue 31
https://www.modular.com/newsletters/modverse-weekly-31

Modular (Mojo 🔥) ▷ #🏎engine (3 messages): 

Max Shows Speed: A member noted that after updating Max, there was a performance increase, with Max always being faster than previous stats, despite original benchmarks showing speed increases of less than 1.
Decoding QPS: In the discussion, QPS was clarified to mean Queries per Second.

Modular (Mojo 🔥) ▷ #nightly (25 messages🔥): 

The Empty String Conundrum: There's a debate about the nature of empty strings with some users uneasy about treating an empty String() as a valid string, while others point out the necessity of differentiating String() and String("") for better C interop.
A Bug in String Land: A member exposes a bug related to the corruption of future prints when print(str(String())) is run, with another member following up with a request for a bug report.
Null Terminator Woes in the stdlib: Discussions surface over the problems caused by null-terminated strings, with one member implying there could be a significant count of related bugs and another suggesting that sacrifices for C interop are a necessary evil, akin to maintaining Python compatibility guarantees.
C Interop String Safety: A member suggests using Rust's approach to handling C interop without the pitfalls of C strings and points to C++ Core Guidelines for reference, sparking a discussion on the potential benefits of treating C strings as a separate type.
Mojo's Nightly Compiler Update Unveiled: The announcement of a new nightly release of the Mojo compiler is shared, with reminders to update and links to changes and diffs provided, such as this specific stdlib update.

Links mentioned:

Issues · modularml/mojo: The Mojo Programming Language. Contribute to modularml/mojo development by creating an account on GitHub.
GSL/docs/headers.md at main · microsoft/GSL: Guidelines Support Library. Contribute to microsoft/GSL development by creating an account on GitHub.
[stdlib] Update stdlib corresponding to `2024-04-24` nightly/mojo by patrickdoc · Pull Request #2396 · modularml/mojo: This updates the stdlib with the internal commits corresponding to today's nightly release: mojo 2024.4.2414 .
[stdlib] Update stdlib corresponding to `2024-04-24` nightly/mojo by patrickdoc · Pull Request #2396 · modularml/mojo: This updates the stdlib with the internal commits corresponding to today's nightly release: mojo 2024.4.2414 .
mojo/docs/changelog.md at nightly · modularml/mojo: The Mojo Programming Language. Contribute to modularml/mojo development by creating an account on GitHub.

OpenAccess AI Collective (axolotl) ▷ #general (147 messages🔥🔥): 

Phi-3 Tuning and Performance: Members discussed the challenges in tuning Phi-3 models and noted their notoriously tricky nature. The conversation gravitated toward Phi-3 mini-128k, a new addition to the series, and its significant requirement of 512 H100-80G GPUs for training.

Phi-3's GPU Demands: The expectation that Phi would cater to those with limited GPU resources contrasted with the actual heavy resource needs, with one model described as "f**kin 8 gigs."

The Rise and Optimization of Llama-3: Discussion covered the rapid progress in AI models this month, with Llama-3 identified as a particularly impressive model due to its enhanced 32k token context capacity and robust architecture, including a special RoPE layer.

Tokenizing Troubles in ChatML/FastChat: Users expressed concerns regarding potential issues with tokenizer configurations, including problems with new lines around EOS/EOT tokens, which might impact the performance of trained models like DiscoLM German.

Storm of New Models and Features: The AI community buzzed with the announcement of OpenELM by Apple, and the speculation about Snowflake's 408B Dense + Hybrid MoE model. Besides new models, the release of PyTorch 2.3 was also met with enthusiasm.

Links mentioned:

Meta Llama 3 | Model Cards and Prompt formats: Special Tokens used with Meta Llama 3. A prompt should contain a single system message, can contain multiple alternating user and assistant messages, and always ends with the last user message followe...
microsoft/Phi-3-mini-128k-instruct · Hugging Face: no description found
apple/OpenELM · Hugging Face: no description found
Tweet from Wing Lian (caseus) (@winglian): Excited to see this medical model released. Maxime has been dropping hints about this in the Axolotl Discord.   "Manually evaluated by 10 medical doctors and compared to GPT-4 in a blinded evaluat...
Reddit - Dive into anything: no description found
GitHub - janphilippfranken/sami: Self-Supervised Alignment with Mutual Information: Self-Supervised Alignment with Mutual Information. Contribute to janphilippfranken/sami development by creating an account on GitHub.

OpenAccess AI Collective (axolotl) ▷ #axolotl-dev (10 messages🔥): 

Learning Rate Impact on Divergence: A member observed that a gradual loss divergence in the llama3 BOS fix branch was due to an increased learning rate.
Subjective Model Improvement Despite Loss: The validity of vibes eval over loss metrics was suggested, with a comment that the model feels subjectively better despite what loss figures indicate.
Puzzling Over Missing Eval/Loss Data: A member expressed disappointment about the absence of eval/loss data in a shared observation, leaving evaluation metrics unclear.
Sample Packing Ignoring Sequence Length: The observation was shared that on the yi-200k models, out-of-memory issues occurred because sample packing did not respect the set sequence length, attempting to pack too many samples.
Paged Adamw Optimizer as a Solution: Switching to paged Adamw 8bit was mentioned as a solution to the previously mentioned out-of-memory issues caused by sample packing mishaps.
Potential Progress to 128k Llama-3: Anticipation was expressed for possibly reaching a 128k version of the llama-3 model by the afternoon of the current day.

OpenAccess AI Collective (axolotl) ▷ #general-help (3 messages): 

In Search of Fictional Characters: A user inquired about the availability of a list of fictional characters but no specific details or follow-up responses were provided.
Gratitude Expressed: Following the character list inquiry, another user expressed thanks. The context of the gratitude was not specified.

OpenAccess AI Collective (axolotl) ▷ #datasets (1 messages): 
aillian7: Is there a format for ORPO that i can use for a conversational use case?

OpenAccess AI Collective (axolotl) ▷ #rlhf (1 messages): 

Unsloth vs. Axolotl for DPO: A member enquired about the preference between Unsloth and Axolotl libraries, particularly for usage in Sequential Fine-Tuning (SFT) and starting DPO (Decision Process Outsourcing), aiming to discern the best tool for their needs.

OpenAccess AI Collective (axolotl) ▷ #community-showcase (9 messages🔥): 

Internist.ai 7b Unveiled: The Internist.ai 7b, a 7 billion parameter medical language model, has been released, outperforming GPT-3.5 and surpassing the USMLE pass score. It has been blindly evaluated by 10 medical doctors against GPT-4 using 100 medical open-ended questions, highlighting the significance of data curation and a physician-in-the-loop training approach.

Demolishing the Competition: A brief excited exchange acknowledged the superior performance of the new Internist.ai model compared to other 7b models. “it's demolishing all other 7b models”

Llama on Par with Internist.ai: Despite the advancements, it was noted that llama 8b, an 8 billion parameter model, achieves approximately the same results as Internist.ai 7b. However, there is an emphasis on the fact that llama 8b has a larger size.

Challenges in Training Llama3: The difficulty in training llama3 was mentioned, indicating that the process is challenging and requires further resolution and merges before proceeding.

Link mentioned: internistai/base-7b-v0.2 · Hugging Face: no description found

OpenAccess AI Collective (axolotl) ▷ #axolotl-help-bot (11 messages🔥): 

QLoRA Merging on Memory-Limited GPUs: A member discussed challenges with using merge_lora.py for a QLoRA-trained model, which leads to CUDA out-of-memory errors due to the unquantized model's size. They sought advice on merging QLoRA when the unquantised model cannot be loaded on their GPU.

Prompt Format Varieties Explored: A discussion took place regarding the differences between various prompt formats like Alpaca, ChatML, Vicuna, etc. The prompts serve as guides for models to generate text for specific tasks with each format having a structure suited for distinct use cases or models.

Links mentioned:

OpenAccess-AI-Collective/axolotl | Phorm AI Code Search: Understand code, faster.
OpenAccess-AI-Collective/axolotl | Phorm AI Code Search: Understand code, faster.

OpenAccess AI Collective (axolotl) ▷ #axolotl-phorm-bot (5 messages): 

Load Only Parts of a Hugging Face Dataset: To use a subset of a dataset, apply the split parameter in the load_dataset function. Examples include using percentages like train[:10%] or specific ranges like train[100:200].

How to Combine Different Dataset Parts: For custom subsets, DatasetDict allows combining parts of datasets, such as merging 10% of training data with 5% of the validation data.

Random Splitting for Datasets: The train_test_split method is useful for randomly splitting a dataset into training and testing subsets, such as splitting the full dataset into 80% training and 20% testing.

Link mentioned: OpenAccess-AI-Collective/axolotl | Phorm AI Code Search: Understand code, faster.

LlamaIndex ▷ #blog (4 messages): 

CRAG Fixes RAG Retrieval Flaws: A new approach called Corrective RAG (CRAG, Yan et al.) introduces a reflection layer to categorize retrieved information during the RAG process into "Correct," "Incorrect," and "Ambiguous" categories to improve context gathering. More information can be found in the shared Twitter post.
Phi-3 Mini Debuts Matching Llama 3's Prowess: Microsoft's recently released Phi-3 Mini (3.8B) claims to match the performance of Llama 3 8B across tasks including RAG, Routing, and others, with initial analysis provided by the benchmark cookbook as detailed here.
Run Phi-3 Mini Locally with LlamaIndex and Ollama: Instructions for running Phi-3 Mini on a local machine using LlamaIndex and Ollama are available, featuring a quick notebook and day 0 support as showcased in Ollama's announcement tweet.
Exploring Future Planning with Language Agent Tree Search: As Large Language Models (LLMs) improve, there's potential to develop agentic systems capable of planning an entire tree of possible futures — a significant leap from the current sequential planning methods like in ReAct. The concept signifies advancement in handling complex scenarios and can be explored further in the linked Twitter content.

Link mentioned: Google Colaboratory: no description found

LlamaIndex ▷ #general (140 messages🔥🔥): 

Seeking Chatbot Only Responding to Document Context: Members discussed how to constrain a chatbot built on a RAG pipeline to only answer questions related to the document context and not general knowledge queries. Suggestions included prompt engineering and checking the chat mode options.

Indexing Issues with LlamaIndex and Raptor: A user experienced long indexing times when building a knowledge graph with Raptor. Suggestions were made to focus more on data processing into sensible documents/chunks.

Persistence of Chat History Sought for LlamaIndex: A query was raised about how to preserve chat history between user sessions in LlamaIndex. One solution was to serialize the chat_engine.chat_history or to utilize a chat store like SimpleChatStore.

Querying Pinecone Namespaces via LlamaIndex: A user inquired if they could query an existing namespace in Pinecone via LlamaIndex. It was confirmed possible as long as a key with text exists in Pinecone, which can be specified during the setup.

Scaling BM25 Scores for Fusion with Dense Retrievers: There was a request for methods to scale BM25 scores to be comparable with cosine similarity scores from dense retrievers. A blog post on hybrid search fusion algorithms and the built-in query fusion retriever in LlamaIndex were suggested as resources.

Links mentioned:

no title found: no description found
Querying - LlamaIndex: no description found
Node Postprocessor - LlamaIndex: no description found
Chat Stores - LlamaIndex: no description found
Migrating from ServiceContext to Settings - LlamaIndex: no description found
llama_index/llama-index-core/llama_index/core/schema.py at main · run-llama/llama_index: LlamaIndex is a data framework for your LLM applications - run-llama/llama_index
llama_index/llama-index-core/llama_index/core/chat_engine/condense_question.py at 63a0d4fac912e5262d79ffc7a1c22225d2ec8407 · run-llama/llama_index: LlamaIndex is a data framework for your LLM applications - run-llama/llama_index
llama_index/llama-index-core/llama_index/core/indices/base.py at 63a0d4fac912e5262d79ffc7a1c22225d2ec8407 · run-llama/llama_index: LlamaIndex is a data framework for your LLM applications - run-llama/llama_index
llama_index/llama-index-core/llama_index/core/chat_engine/condense_question.py at 63a0d4fac912e5262d79ffc7a1c22225d2ec8407 · run-llama/llama_index: LlamaIndex is a data framework for your LLM applications - run-llama/llama_index
Chat Engine - Context Mode - LlamaIndex: no description found
GitHub - microsoft/monitors4codegen: Code and Data artifact for NeurIPS 2023 paper - "Monitor-Guided Decoding of Code LMs with Static Analysis of Repository Context". `multispy` is a lsp client library in Python intended to be used to build applications around language servers.: Code and Data artifact for NeurIPS 2023 paper - &quot;Monitor-Guided Decoding of Code LMs with Static Analysis of Repository Context&quot;. `multispy` is a lsp client library in Python intende...
Embeddings - LlamaIndex: no description found
Customizing LLMs - LlamaIndex: no description found
Unlocking the Power of Hybrid Search - A Deep Dive into Weaviate's Fusion Algorithms | Weaviate - Vector Database: How hybrid search works, and under the hood of Weaviate's fusion algorithms.
Relative Score Fusion and Distribution-Based Score Fusion - LlamaIndex: no description found

Interconnects (Nathan Lambert) ▷ #ideas-and-feedback (39 messages🔥): 

Titles in the Quest for AGI: Nathan Lambert considers several titles for an article debating the meaningfulness of the term AGI, like "AGI Isn't real" and "AGI is religion, not science". The suggested titles aim to provoke thought about the evolving definitions of AGI and its significance in science.
Catering Titles to Engage or Not to Engage: While considering titles that could generate more controversy and clicks, Nathan Lambert reflects on maintaining a balance, as his brand isn't constructed around sensationalism, highlighting a preference for engaging existing readers over capturing new ones.
AGI: A Matter of Belief or Fact?: The conversation shifts to how AGI is perceived, with mentions that people often criticized the Sparks paper for its hyped narrative and definition differences, while another member highlighted a tendency for AGI discussions to border on religious fervor.
The AGI Branding Controversy: Members touched on the controversial aspects of branding and defining AGI, with one member noting the discrepancy between what’s advertised versus reproducible results in papers like the Sparks report.
The Irony in Defining AGI: There's a shared amusement over the thought of a jury potentially deciding the definition of AGI due to contractual agreements between OpenAI and Microsoft, highlighting the absurdity and legal complexity surrounding AGI's definition.

Link mentioned: AI CEO says people's obsession with reaching artificial general intelligence is 'about creating God': Arthur Mensch doesn't feel concerned about AI surpassing human intelligence, but he does worry about American tech giants dominating the field.

Interconnects (Nathan Lambert) ▷ #news (21 messages🔥): 

Struggles at the Org: There was a humorously brief comment on the performance of an unspecified organization, with another member linking the situation to the necessity of model overfitting to benchmarks.
GPU Prioritization Insights Shared: The conversation highlighted the prioritization of GPU resources within an organization, suggesting that internal ranking impacts the ability to work with larger models, and theorizing a GPU distribution hierarchy.
The Pressure to Deliver: It was mentioned that external pressure to produce tangible products might be steering teams away from theoretical research and more towards practical methods for improving benchmark performance.
Model Name Mystery Miss: In a playful exchange, one contributor failed to spot the inclusion of the name "Alexis" in a shared text prompt, which was pointed out as an intentional aspect of the question.
Phi-3-128K Sneaks into Testing: There’s an interesting note on Phi-3-128K being tested without disclosing its model name in conversations, highlighting a testing method designed to prevent the potential bias of knowing the model’s identity.

Link mentioned: Tweet from Susan Zhang (@suchenzang): oh no not this again

Interconnects (Nathan Lambert) ▷ #ml-questions (22 messages🔥): 

Social Brainstorming Boosts Research: New research ideas often come from a combination of reading and social interactions, such as discussing concepts with peers and advisors.
Discord as a Hub for Idea Exchange: Discord communities, like this one, are suggested as beneficial spaces for sharing and developing research ideas.
Instruction-Tuning Evaluations Spotlighted: A member brought attention to Sebastian Ruder's article on instruction tuning, questioning the longevity of benchmarks like LMEntry, M2C, and IFEval, but no clear consensus or recognition of these benchmarks was established in the chat.
Simplifying with ML Benches: As the relevance of GPUs grows, one member expressed a preference for simpler benchmarks like MMLU-ChatBotArena to gauge models' capabilities.
HELM Feature Updates Encouraging Introspection: The HELM team's recent updates enabling introspection on poorly performing model instances were mentioned, however, there's no clear sentiment towards HELM's overall impact or its "washed" status.

Links mentioned:

LMentry: A Language Model Benchmark of Elementary Language Tasks: As the performance of large language models rapidly improves, benchmarks are getting larger and more complex as well. We present LMentry, a benchmark that avoids this "arms race" by focusing o...
Empowering Cross-lingual Behavioral Testing of NLP Models with Typological Features: A challenge towards developing NLP systems for the world's languages is understanding how they generalize to typological differences relevant for real-world applications. To this end, we propose M...
Instruction-Following Evaluation for Large Language Models: One core capability of Large Language Models (LLMs) is to follow natural language instructions. However, the evaluation of such abilities is not standardized: Human evaluations are expensive, slow, an...

Interconnects (Nathan Lambert) ▷ #random (8 messages🔥): 

Fleeting Insights from Ross Taylor: A member mentioned the intriguing habit of researcher Ross Taylor, who posts tweets and swiftly deletes them, sparking curiosity and amusement.
Ross Taylor: A Cautious Tweeter: The same member observes that Ross Taylor's quick-deleted tweets sometimes contain hot takes, likely a habit formed from previous experiences at Meta.
Interview with the Elusive Ross Taylor: Nathan Lambert expressed interest in interviewing Ross Taylor, emphasizing the challenges posed by Taylor's cautiousness possibly due to NDA concerns.
No Disclosure, No Interview: Nathan Lambert reasoned that an interview with Ross Taylor would be unproductive if he's constrained from sharing due to NDAs.
Silencing the AGI Buzz: A member humorously remarked on missing out on discussions about a blog post after muting the word "AGI" on their feed.

Interconnects (Nathan Lambert) ▷ #memes (8 messages🔥): 

Interconnects Memes Channel Goes Live: Members have noted that the memes channel in the Interconnects Discord is now live, with initial messages starting to appear about an hour from the timestamp of the message.
Mini Models Hit HF: Discussion indicates that mini models and a 128k context length model are available on Hugging Face, with a mention of recent availability.
Turning on Web Search Might Surprise You: A member humorously shares that enabling web search can lead to findings about an Australian politician with the same name, which inadvertently triggers their Google alerts.

Interconnects (Nathan Lambert) ▷ #posts (10 messages🔥): 

SnailBot Might Be on the Right Track: There was a note of surprise that SnailBot had some functionality, and a prompt to give feedback on whether the tagging feature is bothersome.
Accessibility Issue with "Reward is Enough" Paper: A member brought up an issue accessing the "Reward is Enough" article, initially facing a problem that suggested it might be behind a paywall or a user-specific issue.
Troubleshooting the Paper Access: It was confirmed that an account is not required to view the paper, suggesting the access problem might be specific to the user experiencing it.
The Paper Wall is Real: A humorous acknowledgment was made indicating that access to the paper was indeed blocked.
Access Issue Resolved: The member resolved their access issue, indicating it might have been a personal technical hiccup.

OpenInterpreter ▷ #general (69 messages🔥🔥): 

TTS Alternatives Seekers: A reference to RealtimeTTS on GitHub was shared as a potential live streaming text-to-speech service, suggested as an alternative to expensive options like ElevenLabs, and praise for its creator's work was expressed.
Raspberry Pi Guidance for Beginners: A member considering a Raspberry Pi for Python programming received a recommendation to look into the Raspberry Pi 5 8GB running Ubuntu, with additional insights on its use with the Open Interpreter provided. They were also directed to a GitHub repo for getting started.
Intrigued by AI on Hardware: Conversations surrounding executing Open Interpreter locally on hardware like Raspberry Pi sparked discussions, with multiple users sharing experiences and advice on setups using Ubuntu, connections to Arduinos, and the convenience of having multiple SD cards with fresh installs for quick recovery during tinkering.
Exploring AI Integration with E2B: The CEO of E2B Dev introduced their service, which offers code interpreting capabilities for AI apps, and inquired about community interest in updating their existing Open Interpreter integration. A link to the official E2B integration documentation was mentioned, but no direct SDK link was provided due to posting restrictions.
Execution and Local Mode Updates: Open Interpreter users discussed technical issues and updates, with one referencing the necessity to use the --no-llm_supports_functions flag to execute code properly and another highlighting that there was an update available to fix local model issues, advising users to check a specific Discord channel for support.

Links mentioned:

Discord - A New Way to Chat with Friends & Communities: Discord is the easiest way to communicate over voice, video, and text.  Chat, hang out, and stay close with your friends and communities.
Join the Open Interpreter Discord Server!: A new way to use computers | 8573 members
Unbelievable! Run 70B LLM Inference on a Single 4GB GPU with This NEW Technique: no description found
01/project_management/hardware/devices/raspberry-pi at main · OpenInterpreter/01: The open-source language model computer. Contribute to OpenInterpreter/01 development by creating an account on GitHub.
GitHub - KoljaB/RealtimeTTS: Converts text to speech in realtime: Converts text to speech in realtime. Contribute to KoljaB/RealtimeTTS development by creating an account on GitHub.
Open-source Code Interpreting for AI Apps: Build custom code interpreters for your AI apps and AI agents
Introduction - Open Interpreter: no description found

OpenInterpreter ▷ #O1 (11 messages🔥): 

Cloud Aspirations for OpenInterpreter O1: A member expressed interest in running O1 on a cloud platform, specifically mentioning brev.dev as well as inquiring about compatibility with Scaleway.
Local Voice Control on the Rise: Kristianpaul highlights Home Assistant's new $13 voice remote running on an M5 stack with the Wyoming Protocol, pointing to hardware compatibility with OpenInterpreter 01. Find more at Home Assistant Voice Control.
Manufacturing Milestones for 01 Light: Human_bee has made progress with the manufacturing of the 01 Light and is preparing to share details and a roadmap in a scheduled event. The event is announced for April 30th, with a link to the Discord event provided.
Interactive Manufacturing Q&A: Human_bee encourages members to post questions or topics they want covered regarding the manufacturing update of 01 Light for an upcoming event.
External Device Exploration for O1: Dfreeear seeks resources on running O1 on external devices, inspired by an AI Pin project, while lordgeneralyahtzi shares a Twitter post by Jordan Singer of a similar endeavor.

Links mentioned:

Discord - A New Way to Chat with Friends & Communities: Discord is the easiest way to communicate over voice, video, and text.  Chat, hang out, and stay close with your friends and communities.
Tweet from killian (@hellokillian): we put the 01 into @grimezsz spider

OpenInterpreter ▷ #ai-content (3 messages): 

Stable Diffusion Implementations: An update was provided about adding stable diffusion demos and examining model notes. They included a link to the OpenVINO Toolkit, an Intel-developed tool for optimizing and deploying AI inference.

ONNX Runtime in Focus: A member expressed being overwhelmed by libraries, specifically mentioning the cross-platform machine-learning model accelerator ONNX Runtime. The website explains its compatibility with various ML frameworks and its extensive use in Microsoft products and services.

MLflow: Simplifying ML and GenAI: Another link shared pointed to MLflow, an open-source MLOps platform that claims to unify ML and generative AI applications. The website underscores MLflow's commitment to open source, comprehensive workflow management, and end-to-end unification.

Links mentioned:

MLflow | MLflow: Description will go into a meta tag in <head />
ONNX Runtime: ONNX Runtime is a cross-platform machine-learning model accelerator

Latent Space ▷ #ai-general-chat (64 messages🔥🔥): 

Hydra Configuration Gains Traction: A member acknowledged the widespread use of Hydra or OmegaConf for configuration management, noting Hydra's compatibility with machine learning projects.
Perplexity Gains Ground Over Search Engines: A tweet from @AravSrinivas announced a $62.7M funding round for Perplexity, a search solution challenging traditional search engines, which is now valued at $1.04B. The funding round includes prominent investors such as NVIDIA and Jeff Bezos. Tweet Announcement
AI Engineering Explored in New Publication: AI Engineering by Chip Huyen focuses on building applications with available foundation models, emphasizing AI engineering over traditional ML engineering, and evaluating open-ended models. AI Engineering Overview
Decentralized AI Training on the Horizon: Prime Intellect introduced its infrastructure for decentralized AI development, enabling collaborative global model training. They have also raised $5.5M from various investors including @DistributedG and @Collab_Currency. Prime Intellect Announcement
Community Computer Vision Course Launched: HuggingFace released a community-driven course on computer vision open to all learners, from beginners to those seeking to understand the latest advancements in the field. Join the Course

Links mentioned:

Tweet from Nick Schrock (@schrockn): I am not sure what the venture funding environment looks if the AI bubble pops too close to the Fed bubble popping. Another tranche of fund vintages eviscerated might drain the entire ecosystem for a ...
Tweet from Johannes Hagemann (@johannes_hage): Today, @vincentweisser and I are excited to share what we've been working on.  Prime Intellect is building the infrastructure for decentralized AI development at scale. We aggregate global compute...
Tweet from Nick Schrock (@schrockn): I am not sure what the venture funding environment looks if the AI bubble pops too close to the Fed bubble popping. Another tranche of fund vintages eviscerated might drain the entire ecosystem for a ...
Welcome to the Community Computer Vision Course - Hugging Face Community Computer Vision Course: no description found
Omnisearch: Omnisearch’s groundbreaking search product makes all the content on your site searchable, from audio/video to text, documents and presentations.
AI Engineering: Recent breakthroughs in AI have not only increased demand for AI products, they've also lowered the barriers to entry for those who want to build AI products. The model-as-a-service approach …  -...
Tweet from Aravind Srinivas (@AravSrinivas): Excited to announce we've raised 62.7M$ at 1.04B$ valuation, led by Daniel Gross, along with Stan Druckenmiller, NVIDIA, Jeff Bezos, Tobi Lutke, Garry Tan, Andrej Karpathy, Dylan Field, Elad Gil, ...
GitHub - e2b-dev/code-interpreter: Python & JS/TS SDK for adding code interpreting to your AI app: Python & JS/TS SDK for adding code interpreting to your AI app  - GitHub - e2b-dev/code-interpreter: Python & JS/TS SDK for adding code interpreting to your AI app

Latent Space ▷ #ai-announcements (1 messages): 

TimeGPT Takes the Spotlight: The upcoming US paper club will discuss TimeGPT, a paper on time series with the authors and a special guest. Interested individuals are invited to join the event through this registration link. 

Link mentioned: LLM Paper Club (Survey Day) · Zoom · Luma: The TimeGPT authors have bumped to next week so today we're gonna go thru a few of the old papers on slido! Also submit and vote for our next paper:…

tinygrad (George Hotz) ▷ #general (10 messages🔥): 

Quest for Diagramming Insights: A member inquired about the method for creating certain diagrams seen in PRs. The query was answered with a link to the Tiny Tools Client.

Tinygrad's Purpose Reinforced: A reminder was given that the Discord is focused on tinygrad related questions and discussions, steering away suggestions irrelevant to the core topic.

Exploring tinygrad Capabilities: There was a query about the feasibility of rewriting a privacy-preserving tool against facial recognition systems, with a link to the original project on GitHub - Fawkes.

PCIE Risers Troubles and Solutions: One member asked for recommendations on where to get quality PCIE risers, while another suggested the best strategy might be to avoid using risers altogether. Further discussion pointed to using mcio and custom cpayne PCBs as alternatives.

Call for tinygrad Operation Documentation: A request was made for normative documentation to understand the expected behavior of tinygrad operations, noting the lack of descriptions accompanying the ops list.

Links mentioned:

React App: no description found
GitHub - Shawn-Shan/fawkes: Fawkes, privacy preserving tool against facial recognition systems. More info at https://sandlab.cs.uchicago.edu/fawkes: Fawkes, privacy preserving tool against facial recognition systems. More info at https://sandlab.cs.uchicago.edu/fawkes - Shawn-Shan/fawkes

tinygrad (George Hotz) ▷ #learn-tinygrad (28 messages🔥): 

New Tinygrad Reference Guides: Links to detailed guides on tinygrad internals are provided, with specific focus on common uops and tensor core support. These tutorials by Di Zhu are seen as valuable for understanding the instruction representation (IR) and code generation within tinygrad.

Main Tinygrad Docs to Include New Tutorials: George Hotz has decided to link to the aforementioned guides in the main tinygrad documentation, crediting Di Zhu by name for the creation of these helpful tutorials.

Tensor Core WMMA Fragment Size Discussion: There was a query about the fragment size a thread can hold when using WMMA with tensor cores in tinygrad. It was clarified that each thread could hold a fragment of at most 128 bits for a single input, leading to a discussion on the processing capabilities relative to the thread and matrix sizes.

Debugging Kernel Issues with Tinygrad: A user posted a code snippet which previously had an assertion error, but after re-cloning the tinygrad repo, the issue was resolved. This led to an acknowledgment that the bug had likely been fixed in an update.

Crash Isolation Quest in Beam Search Ops: Efforts to isolate and reproduce a crash using simple_matmul.py in the tinygrad context led to the discovery that buffer size testing might incorrectly trigger a runtime error. The discussion hints at debugging strategies, including log operations to save ASTs for further analysis.

Links mentioned:

MiniTorch: no description found
Form - Tally: Made with Tally, the simplest way to create forms.
tinygrad-notes/uops-doc.md at main · mesozoic-egg/tinygrad-notes: Tutorials on tinygrad. Contribute to mesozoic-egg/tinygrad-notes development by creating an account on GitHub.
tinygrad-notes/cuda-tensor-core-pt1.md at main · mesozoic-egg/tinygrad-notes: Tutorials on tinygrad. Contribute to mesozoic-egg/tinygrad-notes development by creating an account on GitHub.

DiscoResearch ▷ #mixtral_implementation (5 messages): 

Llama3 vs. Mixtral Performance Compared: A German RAG evaluation indicates that Llama3 70b instruct is not performing as well as Mixtral-8x7B-Instruct-v0.1. The evaluation and results can be found at this link.

Questioning Evaluation Metrics: A member raised concerns about the disparity in metrics, particularly for the 'question to context' scores, suggesting an additional metric loglikelihood_acc_norm_nospace to account for potential formatting issues.

Potential Prompt Format Flaw Spotted: There could be a formatting issue in the evaluation prompts, specifically the absence of the "Answer:" part as seen in template source code.

Discrepancies in Results After Template Correction: Upon prompt template correction, DiscoLM German 7b showed improved performance in 3 out of 4 categories, with a decrease in performance specifically in the "choose_context_by_question" category as outlined in the member's shared results. The comparison is available for review here.

Call for Additional Comparisons: A member requested a comparison between the model in question and command-r-plus, although no further details or results were provided in the subsequent conversation.

Links mentioned:

lighteval/src/lighteval/tasks/tasks_prompt_formatting.py at 11b48333b46ecd464cc3979de66038c87717e8d6 · huggingface/lighteval: LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron. - hug...
deutsche-telekom/Ger-RAG-eval · Datasets at Hugging Face: no description found
deutsche-telekom/Ger-RAG-eval · Datasets at Hugging Face: no description found

DiscoResearch ▷ #general (9 messages🔥): 

Haystack LLM Framework Enhanced: A functionality in the Haystack LLM framework has been updated, which indexes tools as openapi specs and retrieves the top_k service based on user intent, dynamically invoking them. This capability is showcased in a shared notebook.

Inconvenience Caused by Hugging Face's Downtime: Members expressed frustration as the Hugging Face platform was reported to be down again, impacting their activities.

Sending Batch Prompts through Local Mixtral: A member sought advice on how to send a batch of prompts through a local mixtral using 2 A100s, previously utilizing vLLM and considering the newly open-sourced TGI. While TGI seems to be intended primarily as an API server, suggestions were made on achieving batch processing with asynchronous requests.

Leveraging llm-swarm for Scalable LLM Inference: In the context of managing scalable LLM inference, a link to llm-swarm on GitHub was shared, although it was noted that it might be overkill for just two GPUs.

Local Batch Processing Preferences: A user expressed a preference for a local Python solution using litellm.batch_completion for batch requests rather than setting up an API server, indicating a likely use of vLLM's local python mode for convenience.

Links mentioned:

GitHub - huggingface/llm-swarm: Manage scalable open LLM inference endpoints in Slurm clusters: Manage scalable open LLM inference endpoints in Slurm clusters - huggingface/llm-swarm
notebooks/haystack2x-demos/haystack_rag_services_demo.ipynb at main · vblagoje/notebooks: Contribute to vblagoje/notebooks development by creating an account on GitHub.

DiscoResearch ▷ #discolm_german (19 messages🔥): 

DiscoLM's Dual EOS Tokens Explained: In Llama3's instruct configuration, two end-of-speech (eos) tokens are used: 128001 for regular end of text, and 128009 for end of a conversational turn. The latter helps to manage multiturn conversations by signaling the model to stop answering while still considering the text as part of the same conversation.

Ninyago's Coding Dilemma Resolved: After encountering a problem with DiscoLM_German, ninyago received advice to simplify the code by excluding the attention mask and using model.generate(input_ids=gen_input). Other suggestions included utilizing text generation pipelines for simplicity and vllm for faster GPU inference or llama.cpp for CPU.

Optimizing Generation Length: To increase model output to a desired length, ninyago was advised to use the max_new_tokens parameter instead of relying on max_tokens. The suggestion was based on avoiding sentence cutoffs and ensuring that prompts like Schreib maximal 150 Wörter. are effectively executed.

Prompting in Dual Language Forms: A community member enquired about the effectiveness of prompting DiscoLM models in German using either the "du" or "Sie" form.

Contributions to DiscoLM Welcomed: In response to johannhartmann's interest in contributing quantizations for the Llama3_DiscoLM_German_8b_v0.1_experimental model, jp1 encouraged the collaboration despite future model improvements being on the horizon, suggesting that there was no necessity to wait for the newer version.

Link mentioned: Pipelines: no description found

LangChain AI ▷ #general (25 messages🔥): 

RAG Chatbot Expansion Ideas: A member expressed interest in augmenting a RAG (Retrieval Augmented Generation) chatbot to display web search results alongside its existing database/PDF knowledge base. They are eager to discuss additional feature ideas with the community.
Nested JSON Solutions Sought in Vector DB: A request was made for solutions on defining metadata_field_info in a nested JSON for the Milvus vector database.
Launching a Chat Interface Quickly: Queries were raised about the quickest method to create a startup-like interface that allows customer login and facilitates chatting with a vector database, using Langchain along with Groq or Llma. Members discussed potential toolkits to accomplish this, mentioning the possibility of using Vercel AI SDK and Chroma.
Langchain Chain Types Video Series Debut: A member announced the launch of a video series dedicated to Langchain chain types, including API Chain, Constitutional Chain, RAG Chain, Checker Chain, Router Chain, and Sequential Chain, with links to the instructional videos.
PGVector Store Usage in Chatbots: Information was shared on how to utilize a pgvector store as context for chatbots, and guidance on how to acquire OpenAI embeddings for this purpose was requested and subsequently provided, referencing LangChain documentation.

Links mentioned:

">no title found: no description found
">no title found: no description found
Issues · langchain-ai/langchain: 🦜🔗 Build context-aware reasoning applications. Contribute to langchain-ai/langchain development by creating an account on GitHub.
">no title found: no description found
ChatGroq | 🦜️🔗 Langchain: Setup
Quick Start | 🦜️🔗 Langchain: Large Language Models (LLMs) are a core component of LangChain.
Learning Langchain Series - Chain Types - Introduction: This is a series all about `Langchain chain types`. Learn how to use these spectacular chains in your project.We'll explore the chains from the python librar...
API Chain | Chain Types | Learning Langchain Series | Become an expert in calling APIs with LLMs!: Learn how to call an API using APIChain from langchain. You will see that with the help of this library, you'll be at the forefront of delivering values when...
CONSTITUTIONAL CHAIN | Chain Types | Learning Langchain Series | Build constitutional critics: Constitutional chains gives you the ability to enforce specific revision or critics tasks through LLMs. Confident enough, this chain will be a game changer i...
RETRIEVAL CHAIN - RAG | Chain Types | Learning Langchain Series | Chat with anything on the web: Retrieval chains are famous for empowering the LLMs with retrived documents from various sources (web, pdf, documents, sql database ...).We'll explore the ch...
LLM CHECKER CHAIN | Learning Langchain Series | Chain Types | Fact check statements easily!: If you are looking for a reasoning LLM and develop a natural language model that can verifies content, the check out this tutorial on LLM Checker chains from...
ROUTER CHAIN | Learning Langchain Series | Chain Types | Route between your LLMs in a fashion way!: Router chain is one of the tools you'll definitely need when dealing with multiple tasks! Imagine how to handle multiple APIs or multiple tasks among multipl...
SEQUENTIAL CHAIN | Learning Langchain Series | Chain Types | Let's call multiple LLMs in series!: Sequential chain is one of those founding chains that comprises the connection of several chains together. So if you are looking into automating the communic...

LangChain AI ▷ #share-your-work (3 messages): 

RAG Evaluation Explored: An article that delves into RAG evaluation was featured on the official RAGAS Platform community page, shedding light on the analysis of evaluation scores using RAGAS framework. The member encouraged feedback and open discussion for product ideas, and provided links to the featured community page and the article itself.

Unifying RAG Frameworks Through Langchain: A member shared an article about unifying RAG frameworks by implementing adaptive routing, corrective fallback, and self-correction with Langchain’s LangGraph. This advancement is detailed in a Medium post, accessible through this shared link.

Seeking Partners for Pull Request Review: A member inquired about where to ask for a partner pull request to be reviewed, suggesting that the channel might be an appropriate place for such requests. However, no specific details or links regarding the pull request were provided.

Links mentioned:

❤️ Community | Ragas: no description found
Enhancing LLM's Accuracy with RAGAS: A Deep Dive into Advanced Evaluation Metrics for RAG Systems | LLM Testing RAGAS: no description found

Datasette - LLM (@SimonW) ▷ #ai (22 messages🔥): 

Phi-3 Mini Makes a Swift Entrance: Microsoft's new Phi-3 mini, 3.8B model has impressed users with its speed and efficiency. Running on just a 2.2GB Q4 version, it's capable of handling 4,000 token context GGUF, and it's MIT licensed.

Potential for App Development: The Phi-3 mini model's performance suggests it could serve as a solid foundation for app features, being fast and efficient even with just CPU usage.

A Desktop Powerhouse?: There's enthusiasm about the potential to run a 128,000 token model that doesn't consume all of a user's RAM, with particular interest in its ability to perform tasks like structured data extraction and Q&A.

Ideal for SQL Queries?: If Microsoft's Phi-3 mini can efficiently write SQL queries against a SQLite schema, there's the possibility of turning it into a plugin for Datasette Desktop.

Materialized View Generation: A user tested the model with a large table definition and requested it to write a materialized view, getting a "passable" output despite the complexity of the task.

Link mentioned: microsoft/Phi-3-mini-4k-instruct-gguf: Microsoft's Phi-3 LLM is out and it's really impressive. This 4,000 token context GGUF model is just a 2.2GB (for the Q4 version) and ran on my Mac using the …

Datasette - LLM (@SimonW) ▷ #llm (5 messages): 

In Search of HackerNews Summary Script: The latest version of the HackerNews summary generator script can be found and leverages Claude and the LLM CLI tool to summarize long discussions on Hacker News.

Embed-multi CLI vs. Python API: A member asked about the possibility of using the Python API to create embeddings for a directory of text files, similar to what the llm embed-multi cli provides, but didn't find documentation on it.

Usage of LLM Code Programmatically: Someone inquired about the documentation for using the LLM code programmatically as an abstract method to interface with multiple backends, but couldn't locate the information.

Python API for LLM Prompt Execution: The LLM Python API documentation was shared, detailing how to execute prompts using Python, including basic usage and how to work with different models and aliases.

Links mentioned:

Summarizing Hacker News discussion themes with Claude and LLM: I've been experimenting with the combination of Claude and my LLM CLI tool to give me quick summaries of long discussions on Hacker News .
Python API - LLM: no description found

Cohere ▷ #general (21 messages🔥): 

Seeking Cohere API's IP Range: A member inquired about obtaining Cohere API’s IP range for whitelisting purposes within their organization. 0xmerp offered a temporary solution with a specific IP address, 34.96.76.122, and suggested using the dig command to monitor for any changes.

Navigating Career Paths in AI: Members discussed the value of building complex projects and self-improvement in AI careers. There was a consensus that having substantial skills and the ability to effectively communicate about those skills is more important than networking alone.

Guidance for Advancing in LLM: A member sought advice on how to progress further in machine learning, particularly in fine-tuning and developing novel LLM architectures. The suggested approach was to start with solving one's own problems or to explore the world for inspiration.

LLama 3 Instruct Model with Function Calling: A fine-tuned Llama 3 Instruct function-calling model suitable for commercial use was shared, with links to purchase and guidance for setting up a server provided.

Cohere Toolkit Becomes Open Source: co.elaine shared exciting news about the open sourcing of the Cohere Coral app, encouraging the community to add custom data sources and deploy to the cloud with a link to the related blog post. The toolkit supports Cohere models on various cloud platforms.

Links mentioned:

Trelis/Meta-Llama-3-70B-Instruct-function-calling · Hugging Face: no description found
Login | Cohere: Cohere provides access to advanced Large Language Models and NLP tools through one easy-to-use API. Get started for free.
New Cohere Toolkit Accelerates Generative AI Application Development: Introducing the Cohere Toolkit, an open-source repository of production-ready applications deployable across cloud platforms

Cohere ▷ #project-sharing (5 messages): 

Seeking Cohere Command-r Integration: A member requested help with implementing Cohere Command-r with URL Grounding (RAG) to integrate into BotPress. They highlighted that many users might switch to Cohere due to its favorable comparison with ChatGPT 3.5 in both performance and pricing.

Salutations Echoed with Cohere Branding: The user greeted the channel using Cohere's branding emojis, implying their positive sentiment towards Cohere.

AI Agent as Dubai's Virtual Guide: A concept was shared about an AI Agent designed for Dubai Investment and Tourism, which can interact with Google Maps and access information from www.visitdubai.com.

Exploring Cohere-r for Web Search: A member expressed interest in using Cohere-r as a tool for performing web searches.

Skunkworks AI ▷ #general (5 messages): 

Eureka Moment for Whisper Transcript Summary: A member excitedly shared that they achieved correct summarization of 18k tokens from a rough transcript using gguf with impressive results.
Linear Scaling Woes: The same member mentioned struggling with linear scaling, having been tweaking the settings for four days without success.
Success with LLAMA at 32k: They also noted that llama-8b performs well at a token count of 32k.
GGUF Scales LLAMA3-8B-INSTRUCT Excellently: A Hugging Face repository was linked, highlighting successful scaling of LLAMA3-8B-INSTRUCT to 32k tokens via YARN scaling, not finetuning, with custom edge-quants tested at various bit levels.
Burnytech Enters the Chat: A new member greeted the channel with a simple "Hi!".

Link mentioned: nisten/llama3-8b-instruct-32k-gguf · Hugging Face: no description found

Skunkworks AI ▷ #datasets (6 messages): 

Seeking Multilingual OCR Resources: A member inquired about OCR datasets for languages that aren't widely spoken, expressing a specific interest in document-type data.
Hypernetwork Approach to LLMs Shared: A member linked to an article on Answer.AI about augmenting LLMs with greater reasoning ability and memory, explaining a technique that adds Transformer blocks to maintain output while integrating domain-specific information.
Enthusiastic Nod to LLM Augmentation Strategy: A member confirmed the effectiveness of a strategy for enhancing LLMs by adding new transformer decoder layers to existing architectures, which keeps the original pretrained weights unchanged.
Stable Diffusion and LLM Enhancement Technique Clarification: In discussing LLM enhancements, another member highlighted similarities with "hypernetworks" in the stable diffusion community, though indicating the term might be different in broader literature, and noting the approach involves adding new weights to a frozen backbone model.

Link mentioned: Answer.AI - Efficient finetuning of Llama 3 with FSDP QDoRA: We’re releasing FSDP QDoRA, a scalable and memory-efficient method to close the gap between parameter efficient finetuning and full finetuning.

Skunkworks AI ▷ #finetuning (1 messages): 

Empirical Approach Advocated: A member emphasized the importance of trying out the smartest models for one's specific use-case, noting the empirical nature of AI performance evaluations.

Mozilla AI ▷ #llamafile (10 messages🔥): 

Verbose Prompt Option Confusion: The --verbose-prompt option, although listed in the help file of the Meta-Llama 3-70B Instruct llamafile as a feature for printing prompts before generation, triggers an unknown argument error when used, leading to confusion over its validity.

Backend Service with Llamafile: Members discussed methods to use Llamafile for a backend service without the UI popping up, including the option of running the LLM using Llamafile as a method or service on a different port.

Using Llamafile in Server Mode without Browser: A detailed implementation guide for running a llamafile in server mode using subprocess in Python was provided. It includes starting the llamafile with a nobrowser option to use the backend API and details on configuring and sending requests to different model instances.

mlock Failure on Windows When Loading Model: A user encountered a memory locking issue, failed to mlock 90898432-byte buffer, when trying to load the Mixtral-Dolphin model on a Windows machine equipped with a Ryzen 9 5900 and 128GB RAM, suspecting the problem might be due to the application being 32-bit.

Using External Weights with Llamafile on Windows: In response to the mlock issue, it was pointed out that Windows might require the use of external weights, with a focus on using the bare llamafile from Mozilla-Ocho’s GitHub and the specific command llamafile-0.7.exe --mlock -m dolphin-2.7-mixtral-8x7b.Q5_K_M.gguf. However, the mlock failure persisted even when running another model, phi2 llamafile.

Links mentioned:

TheBloke/dolphin-2.7-mixtral-8x7b-GGUF at main: no description found
Releases · Mozilla-Ocho/llamafile: Distribute and run LLMs with a single file. Contribute to Mozilla-Ocho/llamafile development by creating an account on GitHub.

AI21 Labs (Jamba) ▷ #general-chat (4 messages): 

Inquiry about Jamba's Requirements: A user queried about the compatibility of Jamba with LM Studio, emphasizing their interest due to its substantial memory capacity, comparable to Claude.
Technical Challenges Running Jamba: A user shared obstacles in running Jamba, noting it requires over 200GB of RAM and a powerful GPU like the NVIDIA 4090. They also mentioned failure in getting Google Cloud to allocate a sufficient instance, and an invitation was extended for collaboration in overcoming these issues.
Inappropriate Content Warning: Messages promoting Onlyfans leaks and age-restricted content were posted, which could potentially be against Discord's community guidelines.

Link mentioned: Discord - A New Way to Chat with Friends & Communities: Discord is the easiest way to communicate over voice, video, and text.  Chat, hang out, and stay close with your friends and communities.

LLM Perf Enthusiasts AI ▷ #general (1 messages): 
jeffreyw128: https://twitter.com/wangzjeff/status/1783215017586012566

LLM Perf Enthusiasts AI ▷ #gpt4 (1 messages): 

April Showers Bring AI Flowers: A new GPT release is teased with an anticipated launch date of April 29, as per a snippet from @DingBannu's tweet.
Google's Gemini Gearing Up: Google Gemini signals upcoming releases expected at the end of April, around the 29th and 30th, although the dates might shift, as mentioned in @testingcatalog's tweet.

Links mentioned:

Tweet from Ding Bannu (@DingBannu): APR 29 new GPT
Tweet from TestingCatalog News 🗞 (@testingcatalog): Google Gemini is preparing for new releases to come by the end of April. Note that these dates may change as well. So far they are targeting 29 and 30 of April.   Any guesses on what will be coming ou...

LLM Perf Enthusiasts AI ▷ #openai (1 messages): 

Contemplating Context Usage: A member expressed uncertainty about the extent to which the tool uses the full context provided, yet noted it still performs better than GPT. 

Don't miss what's next. Subscribe to AI News (MOVED TO news.smol.ai!):