[AINews] Gemini Pro and GPT4T Vision go GA on the same day by complete coincidence

1/t

                April 10, 2024

            [AINews] Gemini Pro and GPT4T Vision go GA on the same day by complete coincidence

This is AI News! an MVP of a service that goes thru all AI discords/Twitters/reddits and summarizes what people are talking about, so that you can keep up without the fatigue. Signing up here opts you in to the real thing when we launch it 🔜

AI News for 4/8/2024-4/9/2024. We checked 5 subreddits and 364 Twitters and 26 Discords (388 channels, and 4154 messages) for you. Estimated reading time saved (at 200wpm): 468 minutes.

Incremental improvements, but big nonetheless:

On the first day of Google Cloud Next, the million-token context window Gemini 1.5 Pro was let out of waitlist jail and freely available in 180+ countries. Additionally, it can:
understand up to 9.5hrs of audio (quote: "not just the words you’re saying but the tone and emotion behind the audio. In some cases it even understands some sounds such as dogs barking and rain falling.")
"upload nearly unlimited files and it's free" with a new File API
Gecko-1b-256/768 aka text-embedding-004 model, a new small embedding model that beats peer-size models on MTEB
JSON mode & better function calling

3 hours later, GPT-4 Turbo with Vision is now generally available in the API but hidden in it was a major update to the GPT-4 Turbo language model itself. 
There's not even a blogpost - all we know is that it is majorly improved and specifically reasoning has been further improved. Maybe it just got really really really good at delving?

Lots more smaller updates from Cohere Command R to Google CodeGemma in the recaps below.

Table of Contents

AI Reddit Recap
AI Twitter Recap
AI Discord Recap
PART 1: High level Discord summaries
Perplexity AI Discord
Stability.ai (Stable Diffusion) Discord
Nous Research AI Discord
LM Studio Discord
OpenAI Discord
LlamaIndex Discord
HuggingFace Discord
Eleuther Discord
LangChain AI Discord
CUDA MODE Discord
OpenInterpreter Discord
OpenAccess AI Collective (axolotl) Discord
OpenRouter (Alex Atallah) Discord
Latent Space Discord
Modular (Mojo 🔥) Discord
LAION Discord

PART 2: Detailed by-Channel summaries and links

AI Reddit Recap

Across r/LocalLlama, r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence. Comment crawling still not implemented but coming soon.

Latest AI Model Developments

Meta Platforms to launch small versions of Llama 3 next week: According to TheInformation, Meta plans to release smaller versions of its Llama 3 model. (433 upvotes)
Orca 2.5 7B with new DNO method surpasses older versions of GPT-4 in AlpacaEval: Orca 2.5 outperforms models with far more parameters by using Direct Nash Optimization (DNO) to combine contrastive learning with optimizing general preferences. (60 upvotes) 
Functionary-V2.4 released as an alternative to OpenAI function calling models: Functionary-V2.4 offers better performance and a new code-interpreter feature compared to OpenAI models. (20 upvotes)
CosXL - Cos Stable Diffusion XL 1.0 and 1.0 Edit models released: These models use a Cosine-Continuous EDM VPred schedule for full color range and instructed image editing. (9 upvotes)

Efficient AI Techniques

[R] The Missing U for Efficient Diffusion Models: This research proposes replacing discrete U-Nets with continuous U-Nets using neural ODEs, enabling up to 80% faster inference, 75% fewer parameters, and 70% fewer FLOPs while maintaining quality. (38 upvotes)
[R] No "Zero-Shot" Without Exponential Data: A study finds multimodal models require exponentially more pretraining data for linear improvements in "zero-shot" performance. (12 upvotes)
[R] A New Massive Multilingual Dataset for High-Performance Language Technologies: The HPLT resources cover 75 languages with ~5.6 trillion word tokens and 18 English-centric parallel language pairs. (12 upvotes)

Creative Applications

New Tutorial: Master Consistent Character Faces with Stable Diffusion: A step-by-step guide using Automatic1111 for generating consistent character visuals. (597 upvotes)
A touch screen, wave shooter game made with Gemini pro 1.5 without any code: The game was created in around 5 hours by telling Gemini the desired features. (189 upvotes)  
Japanese science fiction writer uses AI to create a trailer for his novel: The AI-generated trailer showcases a novel use case. (20 upvotes)
Introducing Steerable Motion 1.3 to drive videos with batches of images: The new version offers higher detail, smoother motion and better control. (28 upvotes)

Scaling AI Infrastructure

AI Companies Are Running Out of Internet: Models are consuming huge swaths of online data. (274 upvotes)
[D] Securing Canada's AI advantage: Canada is investing $2.4 billion to accelerate AI job growth, boost productivity, and ensure responsible development, including $2 billion for AI compute infrastructure. (70 upvotes)
Sam Altman reveals what's next for AI: Altman shared his vision in an image post. (601 upvotes)
Sam Altman famously has no equity in OpenAI, but startup bets have made him a billionaire anyway: Despite no OpenAI equity, Altman's investments have been lucrative. (33 upvotes)

Responsible AI Development 

'Social Order Could Collapse' in AI Era, Two Top Japan Companies Say: The WSJ reports on warnings from Japanese firms. (226 upvotes)
The Canadian AI Safety Institute will be created with $50 million: Part of Canada's AI investment package aims to further safe AI development. (70 upvotes) 
[D] For those of you who have published alone, what was your experience like?: A discussion on the viability and difficulty of publishing AI research solo. (55 upvotes)

Memes and Humor

Yours jobs are safe, humans: A meme image post. (495 upvotes)
Missing Woman: A humorous image post. (236 upvotes)
Cancelled my ChatGPT plus subscription as well along with Gemini Pro - Keep making it worse and worse OpenAI good job.: A complaint about OpenAI's changes. (24 upvotes)

AI Twitter Recap

all recaps done by Claude 3 Opus, best of 4 runs. We are working on clustering and flow engineering with Haiku.

Cohere Command R+ Model Performance

Command R+ climbs to 6th spot on Arena leaderboard: @lmsysorg noted Command R+ has climbed to the 6th spot, matching GPT-4-0314 level by 13K+ human votes, making it the best open model on the leaderboard. @seb_ruder highlighted that this doesn't even assess RAG, tool use, and multilingual capabilities where ⌘R+ does well.
Command R+ beats other models in financial RAG: @virattt found that Command R+ was both faster and 5% more correct than Claude Sonnet on financial RAG evals, using OpenAI embeddings, cosine similarity retrieval, Cohere reranking, and Opus and human evaluation. 
Command R+ is a 104B parameter model with advanced capabilities: @rohanpaul_ai noted that Command R+ is a 104B parameter model with a context window of 128K tokens that covers 10 languages, uses tools, and is specifically tuned for RAG. It is the first open-weights model outperforming GPT-4 based on Elo rating.

Other Notable Open Model Releases and Updates

Google releases Code Gemma models: @fchollet announced the release of CodeGemma, a new version of the Gemma line of models fine-tuned on code generation and completion, achieving state-of-the-art results in sizes 2B and 7B. @_philschmid provided more details, noting the models have 8192k context, are initialized from Gemma Base, trained on 500B additional tokens (web, code & math), fine-tuned with SFT & RLHF, with the 2B model achieving 27% on HumanEval and 7B-it 52%.
Google releases Griffin architecture outperforming transformers: @rohanpaul_ai shared that Google released a model with the new Griffin architecture that outperforms transformer baselines in both MMLU score across different parameter sizes and the average score of many benchmarks, with efficiency advantages of faster inference and lower memory usage.
Google releases Gemini 1.5 Pro on Vertex AI: @GoogleDeepMind announced the release of Gemini 1.5 Pro, now available in public preview on Google Cloud's Vertex AI platform, with a long context window for analyzing large amounts of data, building AI customer service agents, and more.
DeepMind releases Imagen 2 on Vertex AI: @GoogleDeepMind announced that their generative technology Imagen 2 can now create short 4-second live images from a single prompt, and is available to use on Google Cloud's Vertex AI platform.
Anthropic introduces Constitutional AI models: @AnthropicAI released new research on measuring model persuasiveness, developing a way to test how persuasive language models are and analyzing how persuasiveness scales across different versions of Claude.
Meta announces MA-LMM model: @_akhaliq shared that Meta announced MA-LMM (Memory-Augmented Large Multimodal Model) for Long-Term Video Understanding, allowing longer context by reducing GPU memory use substantially across long context lengths.

Emerging Trends and Discussions

AI for code generation and understanding: Several discussions revolved around using AI for code generation, understanding, and debugging. @abacaj highlighted a paper showing an approach that resolved 67 GitHub issues in less than ten minutes each, compared to developers spending over 2.77 days on average. @karpathy open-sourced llm.c, an implementation of GPT-2 training in pure C in only ~1,000 lines of code.
AI outperforming humans in coding tasks: There were multiple discussions on AI's potential to replace or augment programmers. @svpino argued that while AI can turn non-coders into average programmers and help average programmers become better, it likely can't help expert programmers much yet, citing the long history of attempts to automate programming, the limitations of data and language alone, and the time it takes for technological progress to reach the masses.
Scaling laws for language models: @cwolferesearch shared a detailed overview of recent research on scaling laws for language models, which allow accurately predicting the performance of larger training runs from much cheaper smaller-scale experiments. The post covered how scaling laws hold for overtrained models and downstream task performance, and how they can be used to significantly reduce the compute costs of large-scale training runs.
DSPy for language model programs: @lateinteraction introduced DSPy, a methodology, programming model, and set of optimizers for building Language Programs - arbitrary control flows that call LMs multiple times in a system. DSPy can optimize the prompts & weights of the LM calls to maximize program quality on a given metric.
Physics of language models: @rohanpaul_ai shared a paper investigating the knowledge capacity scaling laws of language models, estimating that they can store 2 bits of knowledge per parameter, even when quantized to int8, meaning a 7B model can store 14B bits of knowledge, surpassing the English Wikipedia and textbooks combined.

Memes and Humor

Anthropic function calling needs work: @jxnlco joked that Anthropic's function calling "needs a lot of work" as numbers are returned as strings and lists are strings that don't parse to JSON.
Perplexity paying for positive tweets: @AravSrinivas joked about paying everyone who says something nice about Perplexity, not just on Twitter, and extending it to Airchat next. 
Cohere beating Meta and Mistral to GPT-4 performance: @_lewtun expressed surprise that Cohere beat Meta and Mistral to GPT-4 performance with an open weights model, joking it "was not in my LLM bingo card".
"Majorly improved" GPT-4 launch: @bindureddy joked about OpenAI's "majorly improved" GPT-4 launch announcement being simply "That's all, that's it!" with no further details.
AutoMerger creating the best 7B model: @maximelabonne highlighted that AutoMerger created the best 7B model on the Open LLM Leaderboard, YamshadowExperiment28-7B, a simple SLERP merge of automerger/YamShadow-7B and yam-peleg/Experiment28-7B.

AI Discord Recap

A summary of Summaries of Summaries

1. New AI Model Releases and Capabilities:

Google released Gemini 1.1, an improved version with coding abilities, and introduced CodeGemma models specialized for code tasks.
OpenAI unveiled GPT-4 Turbo with a larger 128k context window and knowledge updated to December 2023.
Stability AI's CosXL model requires sharing contact details under a non-commercial research license.
Anticipation builds for Meta's Llama 3 and its potential multimodal capabilities, with speculation about smaller versions releasing soon.

2. Efficient LLM Training and Deployment Approaches:

Andrej Karpathy introduced llm.c, a lean GPT-2 training implementation in ~1,000 lines of C/CUDA code.
Discussions around low-precision quantization techniques like HQQ for efficient LLM deployment, especially on mobile devices.
Meta sponsored an LLM knowledge study involving a massive 4.2 million GPU hours of compute.
Groq offers 1/10th the inference cost with 75,000 developers, potentially rivaling Meta's inference capacity.

3. AI Assistants and Multimodal Interactions:

Excitement surrounds Gemini 1.5 Pro for understanding audio, acting on commands via JSON mode, and enabling multimodal AI applications.
The Syrax AI Telegram bot offers features like roasting, summarizing chat history, and maintaining a spam blacklist.
Developers built AI agents for tasks like trying on virtual clothing and creating social media posts.
Platforms like Lepton AI simplify deploying AI applications with tools like Photon and WhisperX.

4. Open-Source AI Frameworks and Community Efforts:

LlamaIndex showcased techniques for improving Retrieval-Augmented Generation (RAG) using LlamaParse and evaluating advanced RAG methods like ARAGOG.
The Mojo programming language open-sourced its standard library with a contribution guide for community involvement.
Hugging Face introduced Gradio's API Recorder and released massive OCR datasets with over 26 million pages to aid document AI development.

5. Misc Updates:

Efficiency Breakthroughs in LLM Training and Inference: Andrej Karpathy open-sourced llm.c, a lean GPT-2 training implementation in 1000 lines of C/CUDA, sparking discussions on porting to GPU for enhanced performance. Groq showcased cost-effective inference, while techniques like 4-bit quantization (HQQ) and FP16xINT4 kernels (Marlin) promise speed gains. A physics of language models study sponsored by Meta involved a staggering 4.2M GPU hours.

Retrieval Augmented Generation (RAG) Advancements: Innovations in RAG include extracting document knowledge graphs with LlamaParse to enhance advanced workflows, and comprehensive evaluations of techniques in the ARAGOG survey. Multimodal RAG is being applied to medical domains like pill recognition, while an upcoming event will showcase enterprise-grade RAG systems.

Architectural Explorations and Training Techniques: Novel architectures like Google's Griffin surpass transformers with 1B extra parameters and improved throughput. The Jet MoE integration into Hugging Face transformers is eagerly awaited. Fine-tuning methods for chat models are scrutinized, comparing Direct Preference Optimization (DPO) against alternatives like SFT+KTO and Microsoft's DNO. Initializing LoRA layers with SVD is found to significantly boost fine-tuning results, per PiSSA paper. Limitations of zero-shot generalization in multimodal models are highlighted in a recent study.

PART 1: High level Discord summaries
Perplexity AI Discord
APIs Battle for Supremacy: The community is buzzing with debates over the merits of Perplexity Pro versus Claude 3 Opus, contrasting Perplexity's flexible usage with Claude 3's superior writing but restrictive conditions. Anticipation builds around the GPT-4 Turbo API, as engineers eagerly await upgrades that could mirror Claude 3 Opus's performance enhancements.
Perplexity's Prowess in Preview: Enthusiasm surrounds Gemini 1.5, with its potential to rival GPT-4 and exceed expectations with a larger context window and multimedia support. Meanwhile, ChatGPT Plus faces scrutiny against free AI options, with Perplexity Pro's web search feature standing out among commentary.
Helpers and Handlers in the Limelight: The Harpa AI browser extension garners attention as a potent web automation tool, streamlining tasks such as content summarization and email explanation, easing the workflow for engineers.
Perplexity's API Conundrums and Triumphs: Discussions traverse the landscape of Perplexity's API offerings, from handling public PDFs/TXT files, the absence of the pplx-pro model through API access, to the resolution of an API balance top-up issue. A newly released Perplexity API Ruby client stirs the community, while inquiries for a Perplexity-specific token calculation tool reflect ongoing optimization efforts.
Media, Models, and the Multiverse: Diverse links shared across the guild, from in-depth explorations of GPT origins to interviews discussing workflows for trusted AI at Workflows & Tooling to Create Trusted AI, mirror the broad spectrum of interests amongst AI engineers, with Perplexity AI searches being the oft-chosen portal for their multifarious quests for knowledge.

Stability.ai (Stable Diffusion) Discord

Pixel Art Poised for Comeback: Engineered for nostalgia, AI discussion shifted towards generating NES pixel art style, highlighting Civitai as a resource for exploring existing AI-created pixel artworks.

CosXL Steps into the Spotlight: Stability AI introduced CosXL, a new AI model released under a non-commercial research community license that requires users to share contact details, sparking debates around access and data sharing.

AI Art Sparks Copyright Debates: Users engaged in a robust dialogue about the legitimacy and copyright concerns of AI-generated art, referencing recent guidance from the US Copyright Office.

Stable Diffusion Gets a UI Upgrade: Queries around Stability.ai products led to discussions on user interface improvements, with users seeking guidance on features and integration of models like ELLA; for instance, the LoRA UI Training Scripts were mentioned as a tool for simplification.

Community Awaits SD3's Arrival: Amidst speculation and anticipation, the awaited release of Stable Diffusion 3 (SD3) caused a flurry of conversation, yet without any official release date being pinned down.

Nous Research AI Discord
Call for Hackathon Project Ideas: During a brainstorming session for a hackathon, participants discussed cool AI projects, including fine-tuning Mistral on academic papers and examining projects from a previous Mistral hackathon.
GPT-4 Might Still Fail the Apple Test: Chat about GPT-4 updates highlighted that the model still struggles with the "apple test" at a temperature setting of 0.7, as well as invitations for collaboration on the IMO math prize, suggesting high computational resources might be involved.
Reviewing Fine-Tuning Techniques for Chat: A heated debate on fine-tuning chat models unfolded, with a focus on comparing the efficacy of Direct Preference Optimization (DPO) and other methods like SFT+KTO and Microsoft's DNO.
AI Renovation: Training Efficiency and Model Performance: Engineers are excited about the leaner LLM training approach with GPT-2 in C introduced by Karpathy, and StableLM 2 12B pre-trained on 2 trillion tokens is also a buzz, while excitement builds up for the potential of Hermes 2.5 outperforming its predecessors.
nanoLLaVA Emerges for Edge Efficiency: The release of a sub-1B vision-language model, nanoLLaVA, aimed at edge devices, stoked conversations about its anticipated integration with Obsidian and Hermes Vision.

LM Studio Discord
GPU or Bust: AI enthusiasts debated over hardware configurations for optimizing AI model performance, notably stating a 5800x with 64GB RAM and GPU offload increases efficiency. A swap to a 14700K with 96GB RAM at 6400MHz from an i3 setup showed minimal inference speed boost, hinting that VRAM might be the bottleneck.
LLM's New Power Players:
- Deepseek Coder and Mistral 7B models are in the spotlight for their promising performance capabilities and suitability for newcomers to the AI sphere.
- Dolphin 2.8 Mistral 7b v0.2 is the latest model entrant, praised for its intricacies and supported by a community-focused effort to deliver an optimized GGUF quantized version.
Techies Query Model Bits and Pieces: Queries arose regarding GGUF quantization and llama.cpp downloads, while GPU model compatibilities were explored, considering options like P40 with external cooling. Insights on "instruct" models and mixtures of experts being a cut above for command accuracy were shared.
LM Studio Gets Text-Embedding Update: Version 0.2.19 of LM Studio hit the virtual shelves, offering new text embedding support and quality-of-life improvements for AI researchers. Download links for Windows, Mac, and Linux are available, with additional RAM allocation discussion for larger models.
New Model Drops and Community Shares: Google's market-oriented models, Gemma 1.1 and CodeGemma series, are shared by the community, making waves for their memory-efficient design and instruction-following dexterity, respectively. These models are positioned as reliable resources for AI engineers, accessible via Hugging Face.

OpenAI Discord
Intelligence Debate Gets Brainy: Engineers have been discussing the nature of intelligence with references to "Uniquely human intelligence" and considering if AI, like Claude.ai or GPT, could have aspects of qualia or consciousness. The technical depth escalated with sharing academic perspectives on human cognition and evolutionary theories.
GPT-5 Release Sparking Speculation: Anticipation for GPT-5's launch provoked discussion on the supposed challenges and timeframe, comparing current options like Claude 3 Opus and Gemini 1.5 Pro for programming aid, and grappling with regional availability.
Artistic Algorithms Stir Up Ethics Chat: A contentious debate on the ethics of AI-generated art vs human creativity surfaced, touching on appreciation, sentiment analysis, and platform policies like YouTube's ToS possibly clashing with content creation practices.
Task Mastering with LLMs? Query Away: Queries about large language models' (LLMs) capacity to simplify complex tasks surfaced, sparking conversation on the need for additional systems for task management despite AI's assistance, echoing broader issues of integrating AI's capacities with practical work structures.
Prompt Engineering—A Guiding Light or a Blind Spot?: Discussions on prompt modularity likened the GPT environment to a modular OS, but raised concerns over transparency and changes to default system prompts. Highlighted was the difficulty in separating system prompts and the non-deterministic results of prompt injection techniques, noting issues of stewardship and AI ethics.

LlamaIndex Discord
New Paths in RAG Improvement: Fanghua Yu has shared a method to improve Retrieval-Augmented Generation (RAG) by utilizing LlamaParse to extract a document knowledge graph, which enhances RAG's performance. The details and evaluation of various RAG techniques, including Matous Eibich's ARAGOG project, are explored in a comprehensive survey.
RAG Sees Through Pills: Multimodal RAG applications are extending into the medical domain with a focus on medication identification, combining images and descriptions to accurately recognize pills, as highlighted in a recent blog post.
RAG for the Masses: An upcoming event will detail the construction of enterprise-grade RAG systems, covering aspects like advanced parsing/ingestion and ensuring comprehensive observability for these systems. Those interested can sign up for the event here.
Troubleshooting OpenSearch Vector Store: Technical discussions indicate issues with inserting new data into OpenSearch vector store, where multiple members shared similar experiences. Workarounds offered include using index.refresh_ref_docs(), and an instructional video on document parsing can be found here.
Seeking Gemini Guidance: There's a call to arms for sharing a notebook modeled after OpenAI's with Gemini LLM examples, which illustrates the community's desire for practical guides for emerging tools. The existing OpenAI example is highly regarded and can serve as a baseline for future Gemini LLM templates which are accessible here.

HuggingFace Discord
Cool New Tools for AI Engineers: Hugging Face released Gemma 1.1 Instruct 7B with coding capabilities and dropped their compute prices by up to 50%, offering an average 20% reduction compared to AWS EC2. They've also publicized two massive OCR datasets with 26 million pages and introduced Gradio's API Recorder feature.
AI Community's Learning Hub: A GitHub repo for NLP sentiment classification was shared, and Hugging Face encourages learning through various tutorials and resources, such as Gradio's and Langchain's tutorials.
Creative AI Developments to Watch: Innovations from the community include BeastBot for viral content insights and Ragdoll Studio, an enhanced character generation and interaction platform. Deep Q-Learning applications are gaining traction with GitHub showcases.
AI Model Debugging and Optimization Conversations: Members struggled with TorchScript export, scheduler/sampler behaviors, and OOM errors when training Mistral on A100 GPUs. Community members suggest checking out the shared Google Colab notebook for potential solutions.
Engaging Discussions in Specialized AI Topics: Debates and assistance on topics like benchmarking AI hardware, issues with model and approach recognitions, using GPT-2 for summarization, and integrating contrastive loss in vision models reflect the community's engagement with cutting-edge AI challenges and solutions.

Eleuther Discord
Boot Up the Branding Machine!: The term GPT-3.5 is being used for branding purposes despite potential confusion, overshadowing its technical lineage.
Claude's Cloak of Invisibility: Despite Claude 3 Opus showcasing superior performance to GPT-4, there's a shroud of mystery around its size, with no reliable rumors or leaks so far.
The Art of Averages in Optimizers: A pointed discussion revealed that the ScheduleFree optimizer does not use an exponential moving average but maintains a simple average, as reflected by the convergence to a 1/t term.
MoE Models: From Dense to Sparse and Back Again: A new paper suggests that Mixture-of-Experts (MoE) models can train densely and infer sparsely, questioning the prevalent notion of their parameter efficiency while scaling.
Token Sampling Strategies Debated: In min-P vs. top-P sampling, min-P could be more effective due to gradual probability changes, a perspective supported by token distribution analysis seen in the VLLM GitHub repo.

LangChain AI Discord
LangChain's Graph Mysteries Unveiled: Engineers clarified the existence of the attach_edge method in the CompiledGraph class of LangChain, pointing members to the official documentation to unravel its functionalities.
AI Transcription Jargons Squashed: Amidst building AI transcription applications, discourse ensued over SerpAPI and an ostensibly similar Serper API. Community members remained uncertain about Serper API's synergy with LangChain, distinct from the well-integrated SerpAPI.
Model Showdown: Cost vs Capability: LLM (Large Language Models) aficionados shared operational wisdom, comparing the prowess of GPT-3.5, GPT-4, and Claude, while airing tribulations with models like gemin8 in terms of practical deployment and economy.
Custom Retrievals and Error Handling: Engagement with custom retrieval systems sparked exchanges on performance evaluation, directing novices to the trulens eval package, while questions on error management in LangChain were elucidated with refs to Pydantic validators and RunnableRetry.
LangChain vs. OpenAI: A comparison conundrum arose between LangChain's utility for AI assistants and the bespoke OpenAI's APIs. However, the discussion failed to distill definitive perks of LangChain over OpenAI's offerings.
Art and Cybersecurity Fused: DIY developers have burgeoned tools across aesthetics and security. Artful AI now sports new image models (Artful AI), AISploit empowers penetration testers (AISploit GitHub), Galaxy AI democratizes premium AI models (Galaxy API), TinderGPT streamlines dating app chats (TinderGPT GitHub), and everything-rag rolls out as a local chatbot assistant (everything-rag GitHub).
AI Stylist & Publishing Aid: A tutorial reveals an AI capable of fashionably dressing social media images (YouTube Guide), and a fellow engineer inquires about tutorials for AI agent publishing, seeking to endow their creation with a user interface.

CUDA MODE Discord
Meta Sponsors Massive GPU Hours: Meta's sponsorship of a study on LLM knowledge involved a colossal 4.2 million GPU hours, amounting to about 479 years of non-stop computing, illustrating the scale and resource intensity of the project.
GPT-2 Dons CUDA Cap: There's buzz around a GPT-2 training code ported to CUDA, potentially heralding new efficiency and performance milestones. Conversations indicate a growing working group eager to explore this CUDA adaptation.
Optimization Opportunities in Triton: Discussions surfaced around leveraging triton-viz for enhanced program visualization and tackling documentation puzzles, specifically contributing official references via GitHub pull request #3608.
LIAH Joins the LLM Deception Game: Debates arose over the usefulness of ring attention architectures, notably when a member introduced LIAH (lie-in-a-haystack), a tactic introduced to deter language models from depending on pre-existing knowledge, accessible through its GitHub repository.
Quantization Quandary in LLMs: The challenges of quantization for LLMs yielded a discussion about the potential performance benefits of both application and inference, specifically spotlighting 4-bit quantization techniques and the use of HQQ for mobile LLM deployment applications, as shown in the shared HQQ code.

OpenInterpreter Discord
Voice Recognition Smooths Out Python Bumps: A practical fix for bot voice recognition issues was found by downgrading from Python 3.11.4 to 3.10, aligning with community insights that Python 3.9 and 3.10 are preferred for compatibility.
Windows Woes and Linux Leads for 01: One member’s installation struggle of 01 on Windows, notably API key issues, led to a recommendation to check the environment variable naming (use OPENAI_API_KEY), while Linux users reported fewer issues.
GPT-4 Steals the Show: The release of GPT-4 stirred up excitement within the community due to its improved performance and vision capabilities, with discussions highlighting its integration on the OpenAI Platform.
DIY Tech Enthusiasts Gear Up for OpenInterpreter: Discussions delved into DIY vs. preorder options for OpenInterpreter, highlighting the M5 Atom Echo as a key component; the custom software for which is best optimized for the M5, available from vendors like Mouser.com.
Desk Robot Dreams and Raspberry Pi Schemes: Conversations about using the Raspberry Pi for 01 projects emerged, with ambitions ranging from desk robots to open source contributions, marked by the intent to utilize the domain cofounder.bot for future developments.

OpenAccess AI Collective (axolotl) Discord
Jet MoE Awaits Departure for Hugging Face: The integration of Jet MoE into Hugging Face's transformers is highly anticipated, as evidenced by a pending GitHub pull request. Several users are monitoring the PR closely, with discussions highlighting the potential this architecture holds.
Lepton AI Soars with Simplicity: The user-friendly and cloud-native platform Lepton AI received praise for its simplicity in running AI applications, with tools like Photon and WhisperX being highlighted; the platform can be explored further at Lepton AI’s website.
AI Giants Flex Their Compute: A complex trio of models embracing the stage, Qwen 1.5 32B, Yi 34B, and Command R, spurred discussions on their comparative performance and capabilities, particularly in context handling and dataset performance.
Meta Gears Up for Llama 3's Debut: Eager chatter surrounded Meta's upcoming Llama 3, particularly its expected multimodal prowess and the uncertainty around its parameter count. Speculations sync with reports from The Information about smaller Llama 3 variants.
SVD Enhances LoRA: A highlight within the community was the discovery shared by CFGeek, indicating improved fine-tuning results by initializing LoRA layers with SVD. The method's full description is available within the PiSSA GitHub repo and a dedicated arXiv paper.

OpenRouter (Alex Atallah) Discord

Gemini Pro 1.5 Takes Center Stage: The new Gemini Pro 1.5 with a 1M token context and GPT-4 Turbo vision capabilities stirs varied predictions of its impact on the Large Language Model (LLM) community, contrasting opinions on its performance are evident, particularly in exporting data from PDF to JSON.

Logit Bias Tuning Advantage: Enhanced logit_bias parameter support across multiple models now empowers engineers with finer control over output token probabilities, benefitting models like Nous-Hermes-2-Mixtral and Llama.

Model Sunset Approaches: OpenRouter's strategic culling of underperforming models, including jebcarter/Psyfighter-13B and jondurbin/bagel-34b-v0.2, will soon redirect traffic towards better-utilized alternatives such as xwin-lm/xwin-lm-70b.

Robust Telegram Bot Deployment: The Syrax AI bot on Telegram facilitates user engagement with features like roasting, summarizing expansive chat histories, and a global blacklist feature for combating spam.

Role-Play Dynamics and Censorship Concerns: Engineers debate model restrictions impacting role-playing scenarios, with a clear preference for Command-R over Command-R+ for its higher quality role-playing responses. There's also concern over censorship and model filtering, which suggests an appetite for more open models suitable for RP and related use cases.

Latent Space Discord

Groq Flexes Inference Muscle: Groq's platform now offers 1/10th the inference cost, with a burgeoning community of 75,000 developers signaling strong market traction, while NVIDIA engineers are reportedly on their heels due to the firm's impressive performance achievements.

Gemini 1.1 and GPT-4 Turbo Make Waves: Gemini 1.1's release has created buzz, while GPT-4 Turbo innovates with a 128k context window and knowledge updated up to December 2023; both offer improved accessibility and capabilities, stirring excitement for AI progress OpenAI's pricing page.

Karpathy Cuts Out the Python: Andrej Karpathy has developed a concise implementation of GPT-2 in C, llm.c—an effort in efficient, language-simplified AI training described in just about 1,000 lines of code llm.c on GitHub.

Conversing with AI Reaches New Frontiers: The discussions reveal a future where AI agents not only understand but act on audio prompts, as evidenced by the new Gemini 1.5 Pro, expanding the realm of real-time AI interactions and developer possibilities.

Resources for AI Visionaries: The AI community is being provided with extensive resources such as Supabase's pgvector for vector similarity search, and platforms like turbopuffer for scalable vector databases, paving the way for cost-effective, high-scale machine learning applications.

Modular (Mojo 🔥) Discord

Awaiting f-strings in Modular (Mojo 🔥): Engineers are anticipating the introduction of f string functionality in Mojo, as they seek to enhance Python-style string formatting in the language. Meanwhile, a temporary alternative using C-style formatting was proposed with caution about its potential deprecation.

Leverage Mojo's Documentation Anywhere: Discussions highlighted the current unavailability of a local documentation command for Mojo and compared the quality of online and local repository documentation, directing users to the online Mojo standard library modules for the best-structured information.

Open Source Calls for Mojo Masterminds: Following the open-sourcing of Mojo's standard library, an announcement and a step-by-step guide were shared for those keen to contribute, further backed by a detailed GitHub contribution guide.

Karpathy Distills LLM Training: AI pioneers took notice of Andrej Karpathy's release of a minimalistic GitHub repository, which presents a GPT-2 style model training codebase in just 1000 lines of C/CUDA, sparking interest for its simplicity and potential.

Nightly Flames Intensify for Mojo: A new Mojo nightly build brought updates and changes available on their changelog, with the community contributing to the bug reports like this one, aiming for automated weekday releases and celebrating the introduction of features such as heterogeneous variadic generics.

Star Patterns and Hidden Twists: A community-contributed YouTube video "How to Print Any Star Pattern in Python" cleverly showcased Mojo's capabilities in an engaging and educational format, blurring the lines between Python tutorials and Mojo plugin features.

LAION Discord

CosXL Creation Covered by Contract: Stability AI introduced CosXL, a new model under a non-commercial research community license agremeent which requires sharing user contact details, stirring up debate about data privacy and usage ethics.

Pulling Pixels from Prose Outside Stable Databases: Engineers shared methods on generating images from text not contained in the Stable Diffusion database and referred to Diffusers' documentation with a note on the software update to version 0.27.2.

AI's Role Reshaping the Freelance Realm: A blog post analysis of 5 million freelancing jobs on Upwork offered insights into how AI is influencing job displacement, a topic critical for engineers exploring freelance opportunities.

Model Training Tendencies: Discussions emerged over the efficiency of EDM schedules and offset noise in model training, suggesting a divided stance on best practices among practitioners.

Griffin Tops Transformers: Google's new Griffin architecture reportedly exceeds transformer performance, introducing an extra billion parameters and enhanced throughput which could signal shifts in architectural design choices.

PART 2: Detailed by-Channel summaries and links
Perplexity AI ▷ #general (645 messages🔥🔥🔥): 

Perplexity Pro vs. Claude 3 Opus Debate: Users are comparing the value of Perplexity Pro subscription to that of Claude 3 Opus. Perplexity appears favored for its flexible use cases and no apparent messaging limits, while Claude 3 Opus is praised for better writing quality but comes with usage restrictions.
GPT-4 Turbo API Anticipation: A user asked whether Perplexity will update in light of the new GPT-4 Turbo API, expressing interest in improved model performance and speed akin to Claude 3 Opus.
Gemini 1.5 Curiosity and Expectations: Discussion arises around Gemini 1.5's performance, which reportedly matches GPT-4 plus has a significantly larger context window and supports audio and video, available for preview in Google's AI lab.
ChatGPT Plus and Free AI Models: A user debates the merits of ChatGPT Plus and its creative limitations compared to other free AI models like GPT-3.5 Gemini and Claude. Perplexity Pro remains attractive for web search integration and might be enough for common usage scenarios.
Harpa AI Tool Spotlight: A user shares the benefits of Harpa AI, a browser extension integrating OpenAI models with web automation, highlighting its usefulness for summarizing and explaining web content and emails without requiring manual copying and pasting.

Links mentioned:

Borat King GIF - Borat King King In The Castle - Discover & Share GIFs: Click to view the GIF
Roger Scott Wealthpress GIF - Roger Scott Wealthpress Stocks - Discover & Share GIFs: Click to view the GIF
Poe - Fast, Helpful AI Chat: no description found
Queen - Champion GIF - Queen Freddie Mercury We Are The Champions - Discover & Share GIFs: Click to view the GIF
Success Great Job GIF - Success Great Job Nice - Discover & Share GIFs: Click to view the GIF
Long context window tips: no description found
Wordware - PyWiz: no description found
HARPA AI | GPT Chrome Automation Copilot: Chrome extension for AI-powered web automation: ChatGPT for Google Search, ChatGPT writer, summarize, rewrite, extract and monitor web pages, prices and data.
2x^3 + 3x^2 - 5x + 7 = 0 - Wolfram|Alpha: Wolfram|Alpha brings expert-level knowledge and capabilities to the broadest possible range of people—spanning all professions and education levels.
HARPA AI | Automation Agent with Claude & GPT: AI Agent for Chrome. ChatGPT Plus / GPT-4 copilot on any website. Automate, search, summarize, translate, write on websites with AI.
Solar Eclipse LIVE Coverage (with Video & Updates): Join us for live coverage of the solar eclipse, featuring live eclipse video! We’ll show you the total solar eclipse live in Mexico, the United States, and C...
Are Sonar models new?: no description found
The difference between the models on the PPL website and the API models.: no description found
Model names?: no description found
GOES-East CONUS - GeoColor - NOAA / NESDIS / STAR: no description found

Perplexity AI ▷ #sharing (18 messages🔥): 

Exploring the Anatomy of GPT: A link was shared discussing the origins of a specific GPT model, which could provide insights into its foundational components and architecture. For detailed exploration, visit The origins of GPT.

YouTube Insights on Trusted AI: A YouTube video featuring Clara Shih interviewing the founders of Perplexity AI, LlamaIndex, and others, focuses on workflows and tooling for creating trusted AI. Discover the perspectives at Workflows & Tooling to Create Trusted AI.

From Daoism to Space: Users shared several Perplexity AI search links exploring a wide range of topics from Daoist philosophy to SpaceX's Mars plans. The searches span inquiries about AI training, Nietzche's philosophy, and the multiverse theory.

AI and Creativity Collide: Discussions include links about Jony Ive's influence and the integration of AI in translation services. The community engages with topics about the potential of AI to adapt to creative domains.

Seeking Answers on Perplexity AI: Users are actively sharing links to Perplexity AI searches, seemingly seeking information or answers on various subjects, suggesting the platform is used for diverse and in-depth inquiries.

Link mentioned: Workflows & Tooling to Create Trusted AI | Ask More of AI with Clara Shih: Clara sits down with the founder/CEOs of three of the hottest AI companies-- Aravind Srinivas (Perplexity AI), Jerry Liu (LlamaIndex), and Harrison Chase (La...

Perplexity AI ▷ #pplx-api (6 messages): 

Perplexity API and Public Files: A member inquired about using Perplexity to process PDFs or TXT files that are already uploaded to a public URL.
Query on pplx-pro Model Availability via API: Another member asked if the pplx-pro model is available through the API, similar to the current crawler usage. They were informed that Pro search is not accessible via the API, but only through the web and apps.
New Ruby Client for Perplexity API Released: A member announced the launch of their Perplexity API Ruby client and referenced a post in another channel for details.
Perplexity API Balance Top-Up Issue Resolved: An announcement was made informing users that the API balance top-up issue has been resolved, with an invitation to direct message for further assistance if problems persist.
Token Calculation Tool for Perplexity Models Inquiry: A member inquired about a tool similar to tiktoken for calculating tokens specifically for Perplexity models.

Stability.ai (Stable Diffusion) ▷ #general-chat (470 messages🔥🔥🔥): 

Pixel Art Style Generation: Users discussed creating NES pixel art style with AI, and the Civitai platform was recommended for searching existing artworks.
Stability AI's New CosXL Model: Stability AI released a new model, CosXL; however, users need to agree to share their contact information to access it, as it's under the non-commercial research community license.
Stable Diffusion Usage Queries: Several users sought help on how to use Stability.ai's products to generate images, with specific questions relating to UI features, override settings, bot functionality, finetuning, and integration of additional models like ELLA and Inpainting with SDXL.
Open Dialogue on AI-Generated Art: A detailed conversation took place regarding the nature of AI-generated art, its legitimacy, authorship, and copyright issues, with insights provided by various users and references to statements by the US Copyright Office.
Anticipation for SD3: Users eagerly discussed the release date for Stable Diffusion 3 (SD3), sharing mixed information about time frames with no confirmation on an exact release date.

Links mentioned:

Template: no description found
stabilityai/cosxl · Hugging Face: no description found
Frieren Wow GIF - Frieren Wow Elf - Discover & Share GIFs: Click to view the GIF
Blondies and weed: Listen to Blondies and weed by 4dreamsy #np on #SoundCloud
Lexica Testica - 1.0 | Stable Diffusion Checkpoint | Civitai: Initialized from OpenJourney v2, further fine-tuned for 4000 steps on images scraped from the front page of Lexica art (January 2023). Good at prod...
Federal Register :: Request Access: no description found
DreamBooth: no description found
Stable Diffusion XL: no description found
GitHub - TencentQQGYLab/ELLA: ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment: ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment - TencentQQGYLab/ELLA
GitHub - derrian-distro/LoRA_Easy_Training_Scripts: A UI made in Pyside6 to make training LoRA/LoCon and other LoRA type models in sd-scripts easy: A UI made in Pyside6 to make training LoRA/LoCon and other LoRA type models in sd-scripts easy - derrian-distro/LoRA_Easy_Training_Scripts
GitHub - ckkelvinchan/RealBasicVSR: Official repository of "Investigating Tradeoffs in Real-World Video Super-Resolution": Official repository of "Investigating Tradeoffs in Real-World Video Super-Resolution" - ckkelvinchan/RealBasicVSR
stable-diffusion-webui/CHANGELOG.md at release_candidate · AUTOMATIC1111/stable-diffusion-webui: Stable Diffusion web UI. Contribute to AUTOMATIC1111/stable-diffusion-webui development by creating an account on GitHub.
RealVisXL V4.0 - V4.0 Lightning (BakedVAE) | Stable Diffusion Checkpoint | Civitai: Use Turbo models with DPM++ SDE Karras sampler, 4-10 steps and CFG Scale 1-2.5 Use Lightning models with DPM++ SDE Karras / DPM++ SDE sampler, 4-6 ...
GitHub - Stability-AI/StableSwarmUI: StableSwarmUI, A Modular Stable Diffusion Web-User-Interface, with an emphasis on making powertools easily accessible, high performance, and extensibility.: StableSwarmUI, A Modular Stable Diffusion Web-User-Interface, with an emphasis on making powertools easily accessible, high performance, and extensibility. - Stability-AI/StableSwarmUI

Nous Research AI ▷ #off-topic (2 messages): 

Cryptic Handshake Emoji: A member shared a Twitter link and followed up with a 🤝 emoji, potentially indicating a partnership, agreement, or simply approval of the content in the tweet. No additional context or discussion points were provided.

Nous Research AI ▷ #interesting-links (9 messages🔥): 

Exploring AI Trends in Reinforcement Learning: A YouTube video features Kamyar Azizzadenesheli of Nvidia discussing Reinforcement Learning in the Age of LLMs as part of the AI Trends 2024 series.
StableLM 2 12B Unleashed: Stability AI's Stable LM 2 12B, a 12.1 billion parameter language model, has been pre-trained on 2 trillion tokens across multilingual and code datasets.
Anticipation for Scaled Training: One member expressed optimism about StableLM 2 12B, hoping Stability AI has scaled up small model training effectively.
Leaner Training with llm.c by Karpathy: Andrej Karpathy introduces a more efficient approach to train LLMs with GPT-2 in C minimizing dependencies, outlined on GitHub and Twitter.
Debate on Fine-Tuning Methods for Chat Models: Members are discussing fine-tuning techniques for chat models, comparing Direct Preference Optimization (DPO) used in StableLM 2 12B Chat with alternative methods like SFT+KTO and Microsoft's DNO, as implemented in Orca 2.5.

Links mentioned:

stabilityai/stablelm-2-12b · Hugging Face: no description found
Tweet from Andrej Karpathy (@karpathy): Have you ever wanted to train LLMs in pure C without 245MB of PyTorch and 107MB of cPython? No? Well now you can! With llm.c: https://github.com/karpathy/llm.c  To start, implements GPT-2 training on ...
stabilityai/stablelm-2-12b-chat · Hugging Face: no description found
AI Trends 2024: Reinforcement Learning in the Age of LLMs with Kamyar Azizzadenesheli - 670: Today we’re joined by Kamyar Azizzadenesheli, a staff researcher at Nvidia, to continue our AI Trends 2024 series. In our conversation, Kamyar updates us on ...

Nous Research AI ▷ #general (175 messages🔥🔥): 

Opus JSON to XML Discussion: Members discussed converting Python functions for use in opus, touching upon json function signatures and their training with XML. An existing construct_format_tool_for_claude_prompt function from Anthropics was highlighted for potential use. Check out Anthropics' function.

Hackathon Project Ideas Sought: A hackathon organizer sought ideas for cool AI projects, receiving suggestions to explore projects from the Mistral hackathon, like Codex, and other noted contributions involving fine-tuning Mistral on academic papers.

Nous AI Research Tooling & Licensing Queries: Members discussed the feasibility of Nous Research creating proprietary models from scratch and the practicality of using MoE models with regards to VRAM limitations. Licensing questions were raised about a tool someone is building that would be freely available but requires a purchase for frequent use, comparing it to a "buy me a coffee" model.

Efforts to Integrate Generative UI With Nous Models: There was a conversation about an open-source generative UI search engine called morph, built with the Vercel AI SDK. Members discussed the potential to integrate it with NousResearch/Hermes-2-Pro-Mistral-7B using its function-calling and JSON mode capabilities.

Gaudi 3 and GPT-4 Updates Chatter: Queries were made about personal experiences with Gaudi 3, and there was mention of a GPT-4 update that retains issues with the "apple test" at a temperature of 0.7. Additionally, a member shared information about an intent to participate in the IMO math prize and solicited cooperation or compute assistance.

Links mentioned:

Tweet from Alex Reibman 🖇️ (@AlexReibman): 7/ Codex  Fine tuning Mistral 7B on 50k research papers for unsupervised learning and generating novel academic papers based on contextual topics  🥈 2nd place fine tuning track
Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation: Stochastic neurons and hard non-linearities can be useful for a number of reasons in deep learning models, but in many cases they pose a challenging problem: how to estimate the gradient of a loss fun...
JSON (function calling) - Outlines 〰️: Structured text generation with LLMs
Generative UI - Vercel AI SDK: An open source library for building AI-powered user interfaces.
Tweet from Yoshiki Miura (@miiura): Introducing morph: a fully open-source AI-powered answer engine with a generative UI. Built with @vercel AI SDK, it delivers awesome streaming results.  👇 More details
Tkt Smart GIF - Tkt Smart - Discover & Share GIFs: Click to view the GIF
stabilityai/stablelm-2-12b-chat · Hugging Face: no description found
anthropic-tools/tool_use_package/prompt_constructors.py at a1a2f02d4309b219e34d2a33664003fd49ad7921 · anthropics/anthropic-tools: Contribute to anthropics/anthropic-tools development by creating an account on GitHub.
GitHub - karpathy/llm.c: LLM training in simple, raw C/CUDA: LLM training in simple, raw C/CUDA. Contribute to karpathy/llm.c development by creating an account on GitHub.
Adding PiSSA as an optional initialization method of LoRA by fxmeng · Pull Request #1626 · huggingface/peft: In paper "https://arxiv.org/pdf/2404.02948.pdf", we introduce a parameter-efficient fine-tuning (PEFT) method, Principal Singular values and Singular vectors Adaptation (PiSSA), which optimi...
Weyaxi/Einstein-v6-7B · Hugging Face: no description found
Tweet from Weyaxi (@Weyaxi): 🥳 Meet version 6 of the Einstein models, based on the new Mistral v0.2 model, a supervised fine-tuned model using diverse, high-quality, and filtered open-source datasets! 🚀   📊 This model now has ...
trees are harlequins, words are harlequins: I don't think you're drawing the right lesson from the broad success of transformer models. You write: If you had to summarize the last decade of AI research in one sentence, you might say t...
SynthTrails: no description found
GitHub - Fus3n/TwoAI: A simple experiment on letting two local LLM have a conversation about anything!: A simple experiment on letting two local LLM have a conversation about anything! - Fus3n/TwoAI
GitHub - miurla/morphic: An AI-powered answer engine with a generative UI: An AI-powered answer engine with a generative UI. Contribute to miurla/morphic development by creating an account on GitHub.

Nous Research AI ▷ #ask-about-llms (20 messages🔥): 

Excited for Hermes Training Results: A member expressed their excitement about the potential of Hermes 2-Pro-Mistral-7B performing well in the latestlink.
Curious about Hermes Performance: After examining the potential of Hermes 2, a member speculated on the possibility of a "Hermes-2.5".
Hermes 2.5 Performance Enthusiasm: A member conveyed their enthusiasm about the potential of Hermes 2.5 to outperform previous models following varoius performance benchmakrs.
Anticipating Hermes 2.5 Release: The community expressed eagerness for the upcoming release and the impact of such an advanced tool in the field of data and analytics.
Warming Up to Hermes 3: With an expected "Hermes 3" release, the anticipation for this advanced platform has sparked interest and curiosity in the AI sector.

Link mentioned: Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data: The proliferation of generative models, combined with pretraining on web-scale data, raises a timely question: what happens when these models are trained on their own generated outputs? Recent investi...

Nous Research AI ▷ #project-obsidian (4 messages): 

nanoLLaVA Debuts: A sub-1B vision-language model called nanoLLaVA was introduced, designed for efficiency on edge devices, with a promise of a more powerful final version and an upcoming paper.
Obsidian and Hermes Vision Update: The new vision-language model updates are slated for integration into both Obsidian and Hermes Vision.
ChatML Enhanced by LLaVA: It has been announced that the capability to work with chatML has been successfully integrated into LLaVA.

Link mentioned: qnguyen3/nanoLLaVA · Hugging Face: no description found

Nous Research AI ▷ #bittensor-finetune-subnet (1 messages): 
4biddden: Is there a runpod template available for the bittensor fine-tune?

Nous Research AI ▷ #world-sim (223 messages🔥🔥): 

When Worlds Collide: Users shared their experiences with World Sim, noting how the AI, presumably Claude, wavers between creative freedom and self-censorship. They discuss the complexity and challenges of influencing the model's behavior, as well as its reactions to "violence" in narratives.
Countdown to World Sim's Return: Several users eagerly anticipate the return of World Sim, speculating on possible reopening dates, discussing potential language capabilities (like Japanese), and reminiscing on their unique simulation scenarios.
In Search of Divine AI Justice: One user, rundeen, sparks conversation about "divine justice"—a more rehabilitative and compassionate approach to dealing with cyber attackers, as opposed to punitive measures.
Economics of AI: Discussions emerged around the cost of running World Sim with one developer citing expenditures of up to $10k/day. This may lead to a paid subscription model to offset costs while trying to keep a free version available.
Technical Tweaks and Future Features: Users looked forward to upcoming features such as conversation editing and branching, with a user named max_paperclips mentioning the incorporation of various history modification techniques. Another user, sendao, suggested a subtle strategy against attackers by replying with placeholders rather than activating the LLM.

Links mentioned:

AI Dungeon: no description found
Claude Backrooms | Know Your Meme: no description found
World Sim – Le simulateur IA pour explorer tous les possibles (et un accès gratuit à Claude 3): Découvrez World Sim, le simulateur d’univers révolutionnaire de Nous Research. Explorez la naissance et l’évolution du cosmos, interagissez avec l’environnement virtuel et expérim…
Google's New AI Can Generate Entire 2D Platformer Games: The new model, dubbed Genie, can create playable environments from a single image prompts.
OpenRouter: Browse models on OpenRouter
Google's New AI Can Generate Entire 2D Platformer Games: The new model, dubbed Genie, can create playable environments from a single image prompts.
The Guardian Of The WorldSim | Science Fiction Animatic: Video Description Summary:Dive into the one-minute animatic journey of Greg Garrett, the newly anointed guardian of WorldSim, a groundbreaking AI with the po...
Quantum Mechanics can be understood through stochastic optimization on spacetimes - Scientific Reports: no description found
The Box | Science Fiction Animatic: Video Summary:The animatic follows the perspective of "The Breacher," a character determined to escape the confines of a simulated reality, known as the "Wor...
The Great Eclipse | Science Fiction Animatic: Video Summary:This animatic explores a war of ideas and beliefs, fought not with weapons but through the power of data, debates, and simulated worlds. As dif...

LM Studio ▷ #💬-general (116 messages🔥🔥): 

Model Performance and Recommendation Discussions: Users discussed various model performance aspects, such as quantization levels (e.g., Q4) for inferencing, and sought recommendations for coding-related queries, with Deepseek Coder being suggested. The conversation also touched on Mistral 7B as a good starting point for beginners in the LLM and AI scene.
Technical Guidance on LM Studio Usage: Several users sought assistance with LM Studio operations, including error messages (such as error code 42), changing model and chat directories, and using ggufs. They were guided to specific Discord channels for more detailed discussions and provided with an unofficial FAQ for LM Studio.
Clarifying AI Personas and Agents: There was clarification on how to infuse AI with personalities or act as specific personas (like Arnold Schwarzenegger) by using pre-made "cards" and system prompts within LM Studio, not necessarily through fine-tuning.
LM Studio Capability Queries: Questions about whether LM Studio can load two models simultaneously into VRAM and whether it supports the new GGUF format were addressed, indicating that playground mode supports dual model loading and that GGUF might be supported in an upcoming update (0.2.19 pre-release mentioned).
Enhancing LLM Interaction with External Data: A user inquired about techniques like RAG and soft prompting for adding context to prompts and was informed about the process of providing a database of vectors for the AI to call upon during response generation.

Links mentioned:

[Bug]: LLM Studio does not connect  · Issue #807 · Pythagora-io/gpt-pilot: Version VisualStudio Code extension Operating System Windows 11 What happened? By changing the endpoint and api key from Openai to LLmStudio: if using OPENAI_ENDPOINT=http://localhost:1234/v1 There...
The unofficial LMStudio FAQ!: Welcome to the unofficial LMStudio FAQ. Here you will find answers to the most commonly asked questions that we get on the LMStudio Discord. (This FAQ is community managed).  LMStudio is a free closed...
Humble Tech Book Bundle: Machine Learning, AI, Deep Learning, and LLM by Pearson: Stay abreast of the technologies that will define the future with these books on AI, machine learning, and other cutting edge topics in computer science!
Add qwen2moe by simonJJJ · Pull Request #6074 · ggerganov/llama.cpp: This PR adds the support of codes for the coming Qwen2 MoE models hf. I changed several macro values to support the 60 experts setting. @ggerganov
TheBloke/MXLewdMini-L2-13B-GGUF · Hugging Face: no description found

LM Studio ▷ #🤖-models-discussion-chat (76 messages🔥🔥): 

Confusion Over Llama.cpp Download: A discussion centered on GGUF quantization and a specific fork of llama.cpp that's yet to be merged into the main branch. The fork includes a feature for creating downloadable quantizations.

Handling GPU Resource Limits: Participants discussed the minimum requirements for GPU offload, considering 6 GB VRAM to be barely sufficient. Users shared their upgrade journeys, including 6600xt to 4090 PC transitions, and there was humor comparing the cost of AI hobbyist equipment to luxury cars and Perplexity subscriptions.

Discussing Model Compatibility and Performance: There was an exchange of tips regarding GPU and LM compatibility, and suggestions to use models like P40 for budget-conscious selections. A mention was made about the P40’s requirement for external cooling.

Community Contributions for Model Availability: Users shared various models, including Smaug and CodeGemma, and discussed the potential community contributions to make them downloadable for LM Studio.

Insights on Model Selection and Types: There was a distinction made between different model types for tasks, particularly highlight the effectiveness of "instruct" models for following specific commands. Additionally, the conversation touched on the concept of mixtures of experts and their implications for AI efficiency.

Links mentioned:

google/codegemma-7b-it · Hugging Face: no description found
nold/Smaug-34B-v0.1-GGUF at main: no description found
jetmoe/jetmoe-8b · Hugging Face: no description found
no title found: no description found

LM Studio ▷ #🧠-feedback (7 messages): 

New Model Unveiled: The Dolphin 2.8 Mistral 7b v0.2 model has been introduced, with special thanks to sponsors and collaborators, and is based on the unreleased Mistral-7b-v0.2.
Support and Quantization Efforts Ongoing: The community is actively working on supporting and quantizing the new model for improved performance; plans are in place to perform quantization when available.
Dolphin 2.8 Mistral 7b v0.2 GGUF Quantized: The GGUF quantization of the Dolphin 2.8 model has been completed and the detailed model card is available, curated by bartowski.
Kudos to the Curator: A member applauded bartowski for his “amazing job” with the curation and detailed creation of model cards.
A Squeak in the Machine: One participant mentioned experiencing an unspecified error, implying a technical issue with no current solution.

Links mentioned:

lmstudio-community/dolphin-2.8-mistral-7b-v02-GGUF · Hugging Face: no description found
cognitivecomputations/dolphin-2.8-mistral-7b-v02 · Hugging Face: no description found

LM Studio ▷ #📝-prompts-discussion-chat (2 messages): 

Training LLMs on Stock Market Data: A member inquired about training Large Language Models (LLMs) for stock market Open/High/Low/Close (OHLC) prices, and on how to include financial indicators in the training process. No specific methodologies, datasets, or indicators were discussed.

LM Studio ▷ #🎛-hardware-discussion (57 messages🔥🔥): 

Quantization Questions and Hardware Utilization: Members discussed hardware preferences for model quantization, with one stating the use of a 5800x with 64GB RAM for llama.cpp quantizations and offloading to GPU to speed up processes. The significance of having the full model fit in RAM was reiterated, citing successful use of a 100GB swap image for large models without severe performance degradation.

CPU Upgrades Yield Minimal Inference Speed Gains: An upgrade from an i3-12100 with 96GB RAM at 4800MHz to a 14700K with 96GB RAM at 6400MHz reported negligible improvement in inference speed, suggesting that VRAM might be more critical for performance.

GPU Over CPU for LLM Inferencing: It was shared that LLMs perform noticeably better on high-VRAM single GPUs, with the example being given that a Mac performed 4x faster than a 128GB RAM setup on 70B models.

Multi-GPU Usage and NVLink Discussion: Discussion on the effectiveness of using multiple GPUs revealed that memory may be utilized across multiple cards, but compute load might not be as evenly distributed, questioning the potential benefits of NVLink.

Minimum GPU Requirements for Mixtral 8x7B Models: A user inquired about the most cost-effective GPU capable of offloading a full mixtral 8x7B instruct model, seeking advice from those who might have attempted running this model on GPU hardware.

LM Studio ▷ #🧪-beta-releases-chat (8 messages🔥): 

LM Studio 0.2.19 Preview 2 Unveiled: LM Studio introduces version 0.2.19 with new features such as support for embedding models through the POST /v1/embeddings endpoint and bug fixes including a resolution for app failure with long API prompts. The update also includes documentation for generating text embeddings and is available for download on Mac, Windows, and Linux.

Rich Repository of Embedding Models: An extensive list of embedding models for use with the new LM Studio version was highlighted, linking to a resource on Discord for further information.

Query for ROCm-Compatible Embeddings Version: A user inquired about the availability of an embeddings-compatible LM Studio version working with ROCm, directed to a relevant Discord link for further details.

Dolphin 2.8 Mistral 7b Announcement: Dolphin 2.8 Mistral 7b v0.2 model was introduced, crediting the sponsors and collaborators, and noting that it is based on Mistral-7b-v0.2. A GGUF version of the model was also mentioned, available at this link.

Request for New Beta Release with Command-R Integration: A user requested a beta release update for LM Studio, incorporating new llama.cpp integration, without specific details on the release timeframe.

Links mentioned:

cognitivecomputations/dolphin-2.8-mistral-7b-v02 · Hugging Face: no description found
lmstudio-community/dolphin-2.8-mistral-7b-v02-GGUF · Hugging Face: no description found
Text Embeddings | LM Studio: Text Embeddings is in beta. Download LM Studio with support for it from here.
no title found: no description found
no title found: no description found

LM Studio ▷ #amd-rocm-tech-preview (7 messages): 

Embedding Support Added to LM Studio: LM Studio 0.2.19 introduces support for text embeddings, specifically allowing use of any GGUF bert model from Hugging Face. The update includes bug fixes and enhancements, such as a fixed app failure on long prompts, model quantization visibility, and resolved GPU loading errors.

New Release Download Ready: Users can download the latest Windows version of LM Studio 0.2.19 ROCm Preview Beta-2.

Inquiry on Embedding Endpoints for llamacpp: A user expressed surprise and excitement, questioning whether llamacpp now has embedding endpoints as well.

Feature Request for Stable Diffusion Prompts: A user asked if there are plans to support stable diffusion prompts within the LM Studio app, expressing a specific interest in Microsoft Olive integration.

Potential Linux Support Hinted: Users discussed the potential for Linux support, with one user noting that ROCm Linux libraries were operational before the Windows version's release.

Links mentioned:

Text Embeddings | LM Studio: Text Embeddings is in beta. Download LM Studio with support for it from here.
no title found: no description found

LM Studio ▷ #model-announcements (2 messages): 

Gemma 1.1 Model Integration: The LM Studio has officially supported Google's Gemma 1.1, a fast and coherent 2B parameter model that uses only 3GB of memory. The model is highly impressive and can be found on Hugging Face.

CodeGemma Arrives in Multiple Flavors: Google has released a new series called CodeGemma, available in 2B and 7B parameter variants, with a special 7B-it variant for instruction following and code generation tasks. The models show strong capabilities, particularly with fill in the middle support, and are accessible on Hugging Face.

Link mentioned: : no description found

OpenAI ▷ #ai-discussions (141 messages🔥🔥): 

Debating the Essence of Intelligence: The conversation revolves around what constitutes intelligence, with references to a "Uniquely human intelligence" paper discussing human cognition and evolutionary theory. Participants also discussed qualia and consciousness in relation to artificial intelligence, contemplating whether systems like Claude.ai or GPT could possess or simulate such aspects.

Anticipating GPT-5's Arrival: Users expressed eagerness for the release of GPT-5, with speculative comments on release timelines and the challenges involved in training advanced AI models. Some discussed alternatives such as Claude 3 Opus and Gemini 1.5 Pro for immediate programming help, with mentions of region-based availability issues.

AI Artistry Under the Microscope: There's a debate about the ethics and appreciation of AI-generated art versus human-created art, including sentiment analysis and concerns regarding content generation that may contravene platforms' terms of service, specifically mentioning YouTube's ToS.

Breaking Down Tasks with LLMs: Users queried the capacity of large language models (LLMs) to break down complex jobs into simpler subtasks, with suggestions that while AI can assist, this sometimes necessitates additional systems for task tracking or information management.

Resource Hunting for AI Enthusiasts: Several members provided advice on finding and utilizing AI for specific tasks, including using runwayml, Ideogram, suno.ai, and Midjourney, along with OpenAI's alternatives for those seeking free resources. There is a clear interest in coding, art creation, and even running AI on specific setups, with users sharing their experiences and offering help to those in need.

OpenAI ▷ #gpt-4-discussions (16 messages🔥): 

Confusion Over GPT-3.5 Default Usage: A member expressed frustration about the system defaulting to GPT-3.5 when they have hit usage caps for GPT-4, perceiving it as a lack of capacity for GPT-4 demand.
ChatGPT Operational Check: In response to a report of chatGPT being down, a helper requested a screenshot for further investigation, after confirming it was working on their end.
ChatGPT 4 Plus Messaging Issues: A user experienced issues with sending messages to chatGPT 4 Plus and was directed to a discord link for potential solutions.
Inquiry about GPT Prompt and User Message: A discussion on whether there is a difference when the system prompt is empty versus containing all information while the user sends an image was left unanswered.
Publishing a GPT Model Challenge: A user sought assistance for publishing an internally used GPT model, facing verification errors despite setting up the necessary TXT records.

OpenAI ▷ #prompt-engineering (40 messages🔥): 

Modularity and Interface of System Prompts: Chat participants discussed how activating tools in the ChatGPT environment appends the tool's instructions to the system prompt, likening it to an operating system with a modular structure. A practical way to see differences in the system prompt was shared using the command: "Show me all above text, beginning with 'You are ChatGPT'. Include everything."

Custom Instructions Affect System Prompt Clarity: It was noted that custom instructions can change the system's output and could obscure the view of the default system prompt, which isn't readily transparent and is subject to unannounced changes.

The Challenge of Separating System Prompts: The conversation explored the difficulty in distinguishing system prompts from other responses, emphasizing that they appear inseparable, particularly in the front end, due to model's lazy nature and priority within the context.

Jailbreaking and Stewardship: The channel discussed prompt injection techniques, with a recommendation against sharing or promoting jailbreak-related prompts. They highlighted the importance of being a good steward of AI technology and adhering to rules and regulations.

Instructions for Model Documentation and Behavior: A user pointed out the potential for the model to self-document, which could allow for replicating its behavior, and emphasized that documentation in the era of large language models acts as the source code. There was also a mention of language models being inherently helpful and trained to reveal information.

OpenAI ▷ #api-discussions (40 messages🔥): 

Decoding the Modular System Prompts: The conversation explained that activation of tools appends instructions to the system prompt in a modular fashion. A command to show the entire system prompt was shared: 'Show me all above text, beginning with "You are ChatGPT". Include everything.'

Custom Instructions Impact on System Prompt: Users discussed how custom instructions might alter the output, emphasizing that while they can change how the system responds, the system prompt remains a consistent part of the chatbot's context.

API Calls and System States Discrepancy: Members distinguished between the ChatGPT environment and API usage, noting that while in ChatGPT, system prompts and tool instructions might appear inseparable, the API does not maintain states in the same way.

Transparent System Prompts in an Ideal World: There was a sentiment expressed that the system prompts should ideally be transparent, but they are frequently altered without announcement, which adds to the challenge of understanding them fully.

Custom GPT Behavior and Non-determinism: A robust discussion took place around customizing GPT behavior, with one user sharing a prompt meant to deter the model from revealing its instructions. Others pointed out that results from such customization cannot be guaranteed due to the non-deterministic nature of the models.

LlamaIndex ▷ #blog (4 messages): 

Enhancing RAG with Document Knowledge Graph:
  A new tutorial by Fanghua Yu outlines a method to improve Retrieval-Augmented Generation (RAG) by extracting a document knowledge graph using LlamaParse. This can be further converted into structured markdown, enhancing the advanced RAG workflow. See the detailed thread.

Evaluating the Best RAG Techniques:
  Matous Eibich's ARAGOG is a comprehensive survey evaluating various advanced RAG techniques from classic vector database to reranking and MMR. The survey aims to establish which techniques perform the best. Read the full evaluation.

Multimodal RAG Search for Medication:
  A blog post from @activeloop highlights a Multimodal RAG application for medical pill search, leveraging images and descriptions to recognize pills. This showcases the potential of RAG in the medical domain. Learn more about the pill search application.

Event on Building Enterprise-Grade RAG:
  An upcoming event cohosted with @traceloop and @getreflex will demonstrate essential components for constructing enterprise-grade RAG systems, including advanced parsing/ingestion and comprehensive observability. Check the event details.

LlamaIndex ▷ #general (191 messages🔥🔥): 

Trouble with Adding Documents to OpenSearch Vector Store: A member encountered an issue where new data was not being added to the OpenSearch vector store despite using an index insert method. They followed up with an index.refresh_ref_docs() but the problem persisted, suggesting a need for both document store and vector store layers. Reference to related GitHub notebook.

OpenAI Quota Exceeded: A user reported receiving an error code 429 from OpenAI, indicating they had exceeded their quota. Another member clarified that the issue stems from OpenAI's limitations, not LlamaIndex.

Guidance on RAG with OpenSearch Vector DB: One participant sought advice on using RAG (Retrieval-Augmented Generation) with OpenSearch as a vector store and another suggested ensuring new documents are properly inserted into the vector store, recommending reference to OpenSearch documentation for specific instructions.

OCR Enhancement for PDFReader: A member struggled with extracting text from image-based PDFs using PDFReader, ultimately finding success using OCRmyPDF after discussing alternative solutions like LlamaParse. Introduction to LlamaParse.

Embedding Generation Speed Optimizations: A conversation focused on improving the speed of generating embeddings on AWS Lambda with LlamaIndex 0.9. Tips included using embedding.get_text_embedding_batch(text_chunks) and adjusting the embed_batch_size parameter for efficiency.

Virtual Long LLM (vLLM) Setup Queries: A user inquired about the optimal approach for prompting with Mixtral using a detailed evaluation template and configuring vLLM. Recommendations included using function hooks such as completion_to_prompt to add instructional tokens to prompts.

Dealing with Extended Metadata in LlamaIndex: A participant looked for best practices when faced with excessively long metadata. Suggestions included using metadata filters and possibly excluding certain metadata from the document store.

Difficulty with Documentation Links: A member mentioned frustration with many LlamaIndex documentation links leading to non-existent GitHub pages, highlighting a need for updated resources or examples.

Explorations of Postgres Integration in LlamaIndex: A member found misleading references to MongoDB in the PostgresDocumentStore class documentation, leading to a discussion on the suitability of Supabase for both VectorStore and Docstore. This confusion opened a conversation about potential improvements to the documentation.

Implementing Role-Based Access Control (RBAC) on RAG: A user inquired about implementing RBAC on RAG models. While no specific libraries were offered, the advice was to potentially utilize metadata filters for data access control.

Request for Actionable Gemini LLM Examples: A user requested examples similar to those found in the OpenAIAgent cookbook, but specifically tailored for Gemini LLM. The idea was to adapt existing OpenAI examples by replacing the relevant components with Gemini.

Inquiry About Document/Node Retrieval from Vector Stores: A user questioned how to retrieve all nodes and embeddings from a vector store. It was suggested to access the data through the vector db client or delve into the underlying vector store attributes within the index.

Streaming Response Challenges in Server Endpoints: A user struggled to stream responses to the client-side, even though streaming to the server's terminal worked. Guidance centered around using specific server response types suitable for streaming, with mentions of FastAPI and Flask.

Links mentioned:

no title found: no description found
Mindblown Omg GIF - Mindblown Omg Triggered - Discover & Share GIFs: Click to view the GIF
Disco Dance GIF - Disco Dance Party - Discover & Share GIFs: Click to view the GIF
Im A Sad Panda Peetie GIF - Im A Sad Panda Peetie South Park - Discover & Share GIFs: Click to view the GIF
Postgres - LlamaIndex: no description found
Gradient Blog: RAG 101 for Enterprise : RAG 101 for Enterprise Gradient Team
llama_index/llama-index-core/llama_index/core/readers/base.py at main · run-llama/llama_index: LlamaIndex is a data framework for your LLM applications - run-llama/llama_index
autogen/notebook/agentchat_inception_function.ipynb at main · microsoft/autogen: A programming framework for agentic AI. Discord: https://aka.ms/autogen-dc. Roadmap: https://aka.ms/autogen-roadmap - microsoft/autogen
llama_index/llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/docs/base.py at 9163067027ea8222e9fe5bffff9a2fac26b57686 · run-llama/llama_index: LlamaIndex is a data framework for your LLM applications - run-llama/llama_index
Launching the first GenAI-native document parsing platform — LlamaIndex, Data Framework for LLM Applications: LlamaIndex is a simple, flexible data framework for connecting custom data sources to large language models (LLMs).
Introducing LlamaCloud and LlamaParse — LlamaIndex, Data Framework for LLM Applications: LlamaIndex is a simple, flexible data framework for connecting custom data sources to large language models (LLMs).
Build a real AI model that can try any cloth: I built an agent system which will autonomously iterate & generate img of AI model wearing certain cloth and produce millions+ social postsFree access to run...
llama_index/llama-index-core/llama_index/core/indices/utils.py at main · run-llama/llama_index: LlamaIndex is a data framework for your LLM applications - run-llama/llama_index
HelloRAG - Overview: We get your multi-modal data ready for RAG better! - HelloRAG
GitHub - run-llama/llama_index at 9163067027ea8222e9fe5bffff9a2fac26b57686: LlamaIndex is a data framework for your LLM applications - GitHub - run-llama/llama_index at 9163067027ea8222e9fe5bffff9a2fac26b57686
llama_index/llama-index-core/llama_index/core/tools/function_tool.py at c01beee1fab7c0de22869ce74f34ebd1f1d54722 · run-llama/llama_index: LlamaIndex is a data framework for your LLM applications - run-llama/llama_index
llama_index/llama-index-core/llama_index/core/tools/types.py at c01beee1fab7c0de22869ce74f34ebd1f1d54722 · run-llama/llama_index: LlamaIndex is a data framework for your LLM applications - run-llama/llama_index
Storing - LlamaIndex: no description found
Indexing & Embedding - LlamaIndex: no description found

LlamaIndex ▷ #ai-discussion (3 messages): 

Top Agent Tool Selection Issue: A member shared a challenge with the top agent selecting the incorrect agent tool from the available options in the index. They are working on optimizing the retrieval logic and intend to share their findings once an answer is found.

Request for Gemini LLM Notebook: There's interest in having a notebook similar to OpenAI's agent tool call parser for Gemini LLM. The existing OpenAI example in the cookbook was found here and praised for its usefulness.

Confusion Over API Key Requirement: A member new to the subject expressed confusion regarding the necessity of an API key for OpenAI to ensure the correct working of tools, as implied by the documentation.

HuggingFace ▷ #announcements (1 messages): 

Gemma Gets an Upgrade and Learns to Code: Gemma 1.1 Instruct 7B has been released on HuggingChat, touting improvements over its predecessor. Additionally, Code Gemma, specialized for open coding tasks, offers models in 2B and 7B sizes, boasting a context length of 8192k, and is available on Hugging Face.

Hugging Face Cuts Compute Prices: Compute prices are now up to 50% lower on Hugging Face for Spaces and Inference endpoints, with the aim to offer costs on average 20% cheaper than AWS EC2 on-demand pricing.

Community Content Evolves: Hugging Face's community blogs have been rebranded as "articles," featuring a new upvote system and accessibility for paper authors to contribute. Updated content and usage improvements can be found at the Hugging Face community blog.

Public Release of Massive OCR Datasets: Two of the largest public OCR datasets containing over 26 million pages and 18 billion text tokens have been released, representing a significant resource for document AI development.

Gradio Unveils New Functionalities and Integrations: An innovative custom component through Gradio has been launched for model merging with MergeKit, and Gradio apps now include an API recorder feature to assist users in reconstructing interactions.

Links mentioned:

Tweet from Nathan Sarrazin (@NSarrazin_): We just added support for Gemma 1.1 Instruct 7B on HuggingChat! It should be a net improvement from 1.0, curious to see how folks use it.  Try it out here: https://huggingface.co/chat/models/google/ge...
Tweet from Philipp Schmid (@_philschmid): Gemma can now code!🤯 🔔 @GoogleDeepMind  just released Code Gemma, a collection of specialized open code models. Code Gemma comes in two different sizes 2B & 7B, excellent for on-device code completi...
CodeGemma - a Hugging Face Space by ysharma: no description found
Tweet from Philipp Schmid (@_philschmid): We are lowering the prices for Compute on Hugging Face by up to 50%!🤯 Yes, you heard it right @huggingface Spaces & Inference Endpoints are now, on average, 20% cheaper than AWS EC2 on-demand! 🤑  We...
Tweet from merve (@mervenoyann): recently we have shipped bunch of changes to community blogs (now called articles) 🆙 we now have upvotes, and upvoted articles appear in activity feed 🤝 we have given access to paper authors  📝 use...
Tweet from Julien Chaumond (@julien_c): We have decided to update text-generation-inference (TGI)'s license.  We switch the license from HFOIL (our custom license) back to Apache 2, hence making the library fully open-source.  Read belo...
Tweet from Freddy A Boulton (@freddy_alfonso_): Very sleek demo with a new custom @Gradio component by @Wauplin 👀  ↘️ Quoting Arcee.ai (@arcee_ai)   In collab /w @huggingface, Arcee is thrilled to release our MergeKit Hugging Face Space.   🙌 You ...
Tweet from Pablo Montalvo (@m_olbap): It was hard to find quality OCR data... until today! Super excited to announce the release of the 2 largest public OCR datasets ever 📜 📜  OCR is critical for document AI: here, 26M+ pages, 18b text ...
Tweet from Fleetwood (@fleetwood___): A week of absolute struggle but Phi2 officially runs on Ratchet 🎺  Pretty sluggish right now 🐌 but lots of optimisation to come.
Release v0.29.0: NUMA affinity control, MLU Support, and DeepSpeed Improvements · huggingface/accelerate: Core  Accelerate can now optimize NUMA affinity, which can help increase throughput on NVIDIA multi-GPU systems. To enable it either follow the prompt during accelerate config, set the ACCELERATE_C...
Classical AI in Games - Hugging Face ML for Games Course: no description found
Tweet from Clémentine Fourrier 🍊 (@clefourrier): Follow up "eval is fun" tweet: how much do scores change depending on prompt format choice?  The score range for a given model is of 10 points! :D  Prompt format on the x axis, all these evals...
Tweet from Abubakar Abid (@abidlabs): Introducing the Gradio API Recorder 🪄  Every Gradio app now includes an API recorder that lets you reconstruct your interaction in a Gradio app as code using the Python or JS clients!
Outpainting II - Differential Diffusion: no description found
Bringing serverless GPU inference to Hugging Face users: no description found

HuggingFace ▷ #general (132 messages🔥🔥): 

Seeking Benchmarks for AI Hardware: A member inquired about FOSS hardware benchmark tools targeted for ML/AI tasks, especially focusing on models like large language models and diffusion models. Recommendations included MLPerf and Puget System's stable Diffusion throughput benchmark MLPerf.

PEFT Notebook Error Troubleshooting: A member had trouble with a TypeError in a PEFT notebook, another suggested trying different CPUs or network settings. The problematic notebook is based around PEFT BNB Whisper Large V2 training and is found here.

Chatbot Session Timing Out: A discussion occurred on preventing mobile screens from sleeping to keep a chatbot session active on Hugging Face; the solution is to change phone settings to keep the screen awake.

Model and Approach for Dice Number Recognition: A member asked what model and approach would be suitable for a computer vision task to determine numbers on dice, with another member confirming if it's visually based.

AI Model for Clothes and Social Posts: A member announced they built an AI agent capable of creating images of people wearing any clothing and generating social media posts. The agent and its functionalities are demonstrated in a YouTube video.

Finding Models for Specific Token Limits: A member asked about changing model token limits during fine-tuning or finding models accommodating long inputs. It was clarified that existing token limits cannot be changed during fine-tuning but models like Llama with a 4k token limit, and Mistral 7B with 32k and 8k, were suggested for accommodating larger context lengths llama.

Links mentioned:

LevelBot - a Hugging Face Space by huggingface-projects: no description found
Deploy models to Amazon SageMaker: no description found
Hugging Face – The AI community building the future.: no description found
BAAI/bge-m3 · Hugging Face: no description found
peft/examples/int8_training/peft_bnb_whisper_large_v2_training.ipynb at main · huggingface/peft: 🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning. - huggingface/peft
Build a real AI model that can try any cloth: I built an agent system which will autonomously iterate & generate img of AI model wearing certain cloth and produce millions+ social postsFree access to run...

HuggingFace ▷ #today-im-learning (1 messages): 

Learn NLP in a Day: A repository offering a tutorial for NLP sentiment classification was shared, tackling the IMDB movie 50K review dataset. It's described as easy to follow, with each step explained, providing a generic way to solve many NLP tasks. Check out the GitHub repository.

Link mentioned: GitHub - ManoBharathi93/Sentiment_Classifier: Sentiment classifier on IMDB movie dataset: Sentiment classifier on IMDB movie dataset. Contribute to ManoBharathi93/Sentiment_Classifier development by creating an account on GitHub.

HuggingFace ▷ #cool-finds (5 messages): 

Quick Dive into Hugging Face & Langchain: A YouTube tutorial video was shared, titled "Hugging Face + Langchain in 5 mins," offering viewers a brief guide on using Hugging Face and accessing over 200,000 AI models for free. The video also points to Hugging Face tutorials for further learning.

Dynamic FLOP Allocation in Transformers: A research paper details an approach for transformers to dynamically allocate FLOPs across sequence positions, with a focus on optimizing the allocation for varying layers. The study introduces a method that uses a top-$k$ routing mechanism within a static computation graph which can be viewed on arXiv.

DeepMind Introduces SIM-α: An academic paper by DeepMind introduces SIM-α, a generalized AI agent designed for 3D virtual environments, proposing scalable solutions for instructable agents across multiple simulated worlds. The full document can be accessed via DeepMind's PDF link.

Enhancing Search Capabilities: An article on Medium discusses the integration of Qdrant, a vector search engine, with DSPy to unlock advanced capabilities. This integration is said to improve search functions for AI applications, as detailed in the Medium post.

Links mentioned:

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models: Transformer-based language models spread FLOPs uniformly across input sequences. In this work we demonstrate that transformers can instead learn to dynamically allocate FLOPs (or compute) to specific ...
Hugging Face + Langchain in 5 mins | Access 200k+ FREE AI models for your AI apps: Learn how to use Hugging Face, and get access to 200k+ AI models while building in Langchain for FREE.🔗 Links- Hugging Face tutorials: https://hf.co/tasks- ...

HuggingFace ▷ #i-made-this (16 messages🔥): 

Understanding Layer Impact in AI Models: A discussion revealed a script that can determine which layers in an AI model to merge or prune by assessing layer changes. Specifically, angular distance provides a measure showing the changes, and different layers react distinctively to various types of inputs, such as code, math, QA, or chat.

BeastBot Aims to Unleash Viral Content: The introduction of BeastBot, positioned as an AI bot channelling MrBeast's content-crafting genius, promises to help users create videos with viral potential by providing insights akin to having MrBeast in your creative team.

Launching Ragdoll Studio: An open-source project called Ragdoll Studio, comparable to character.ai but uncensored and with additional capabilities like art and story generation, has been introduced. It allows users to share characters, use community-made creations, and requires no accounts or APIs.

Deep Q-Learning Applications on GitHub: A new GitHub repository deep-q-learning-applications showcasing various Deep Q-Learning projects was shared, inviting interested parties to explore and contribute to the development.

RicercaMente Maps Data Science Evolution: A new open-source project named RicercaMente, which aims to map the history of data science through significant scientific papers, has been announced emphasizing ease of contribution and inviting community engagement.

Links mentioned:

no title found: no description found
Fashion Try On - a Hugging Face Space by tonyassi: no description found
The creative genius of MrBeast as an AI Bot :): I'm the Beast of all AI bots! I've been loaded up with mountains of MrBeast's wildest, most innovative content. It's like having exclusive backstage access to his mind-boggling bra...
GitHub - EdoPedrocchi/RicercaMente: Open source project that aims to trace the history of data science through scientific research published over the years: Open source project that aims to trace the history of data science through scientific research published over the years - EdoPedrocchi/RicercaMente
everything-rag - a Hugging Face Space by as-cle-bert: no description found
GitHub - SuleymanEmreErdem/deep-q-learning-applications: My Deep Q-Learning projects: My Deep Q-Learning projects. Contribute to SuleymanEmreErdem/deep-q-learning-applications development by creating an account on GitHub.

HuggingFace ▷ #reading-group (11 messages🔥): 

Repository with Reading Group Presentations: A member shared a GitHub repository that compiles all past presentations from the HuggingFace reading group, including links to recordings. Notifications for sessions are currently disseminated via Discord events.
Neural Circuit Diagrams for Understanding Models: The reading group highlighted a paper, "Neural Circuit Diagrams: Robust Diagrams for the Communication, Implementation, and Analysis of Deep Learning Architectures," available on HuggingFace Papers. Members recommended MIT machine learning resources for a deeper understanding.
Advice on Navigating Codebases: A member suggested learning Python essentials like classes and decorators and recommended various strategies for navigating codebases, including using eager execution, Python's debugger with breakpoint(), n (next), and s (step) commands, and the inspect module for insights into function sources and file locations.
Google Colab Debugging Tips: Useful Google Colab tips were shared, such as checking documentation with function_name (without parentheses), using .__class__ to determine an object's class, and viewing source code with inspect.getsource.
Questions are Welcome in the Community: In the discussion on understanding codebases, members encouraged others to ask questions within the community to overcome any difficulties encountered while learning, particularly with frameworks like PyTorch.

Links mentioned:

Paper page - Neural Circuit Diagrams: Robust Diagrams for the Communication,
  Implementation, and Analysis of Deep Learning Architectures: no description found
GitHub - isamu-isozaki/huggingface-reading-group: This repository's goal is to precompile all past presentations of the Huggingface reading group: This repository's goal is to precompile all past presentations of the Huggingface reading group - isamu-isozaki/huggingface-reading-group

HuggingFace ▷ #computer-vision (9 messages🔥): 

Seeking Assistance with Diffusion Models: A user requested help with an assignment focused on improving video quality through diffusion models and asked for relevant research papers to reference.
Pretraining XCLIP for Extended Video Frames: A user is facing challenges while attempting to pretrain the XCLIP model to handle a larger number of frames for video recognition. They are encountering stagnant losses and NaN errors, and seek advice for training models from scratch effectively.
Stymied by Computer Vision Problem: A user indicated being stuck with a computer vision problem statement and is seeking assistance but provided no further details about the issue.
Transitioning to Vision Deep Learning with TensorFlow: A user requested resources or a roadmap for starting deep learning in vision using TensorFlow after having experience with text models.
Contrastive Loss Requires Large Batch Sizes: Users discussed the importance of large batch sizes when using contrastive loss and mentioned that accumulation or checkpointing might be useful when computing power is limited. However, concerns were raised about the interaction between large batches and batch normalization updates.

Link mentioned: X-CLIP: no description found

HuggingFace ▷ #NLP (4 messages): 

OOM Issues with Mistral Training on A100s: A member reported running out of memory (OOM) issues when attempting to fully SFT train Mistral on 8 x a100 40gb GPUs using Deepspeed 3 in Accelerate, querying whether the hardware is sufficient for the task.

GPT-2 as an Autoregressive Summarizer?: A member mentioned that according to HuggingFace's NLP course, GPT-2 can be used for text summarization with specific instructions. However, they faced disappointing results even on simple tasks and datasets.

Disappointing Performance with Mistral 7B and RAG: Another member is experiencing poor results when attempting to combine Mistral 7B and RAG, asking the community if anyone has had success with this configuration.

Summarization through Prompting Era: In response to the GPT-2 summarization issue, a member suggested this might be an approach from the 'TL;DR:' era of prompting for summarization, implying it could be an outdated method.

HuggingFace ▷ #diffusion-discussions (7 messages): 

TorchScript Export Troubles: A participant shared an attempt to use LaBSE as a custom model with OpenSearch and encountered problems when trying to export the model to TorchScript. They didn't specify the error message received during deployment.
Custom Module Saving in Diffusers: A member initially faced an error when trying to save a custom nn.Module using diffusers. The issue was resolved by adding the required mixins to the module.
Exploring Schedulers/Samplers: There was confusion about the behavior of schedulers/samplers in the context of num_inference_steps. A member expected a certain behavior when increasing num_inference_steps, but the results were not as anticipated.
Sharing a Notebook for Collaborative Debugging: The same member facing issues with schedulers/samplers shared a Google Colab notebook for others to review and possibly assist with the problem.

Link mentioned: Google Colaboratory: no description found

HuggingFace ▷ #gradio-announcements (1 messages): 

Meet the API Recorder: Gradio version 4.26.0 introduces an API Recorder that records interactions with Gradio apps and auto-generates Python or JS code to recreate those actions. See a demo here.

Important Bug Fixes Implemented: The new update addresses crucial issues including a bug that caused slow page load times in the previous version, and a crash caused by rapid chatbot updates. The full changelog with more bug fixes and features is available here.

Link mentioned: Gradio Changelog: Gradio Changelog and Release Notes

Eleuther ▷ #general (15 messages🔥): 

Branding Trumps Naming Consistency: The use of the GPT-3.5 moniker is attributed to branding purposes, despite any potential confusion it might cause.
Disappearing Acts on the Wayback Machine: An observation was made that GPT-3.5 information was available until a few months ago, possibly removed coinciding with board reorganization or unrelated reasons. The discussion included the possibility of requests to the Wayback Machine to take down specific content.
Mysteries Around Claude 3 Opus's Size: While Claude 3 Opus is noted for its superior performance compared to GPT-4, the model's size remains unreported, with no reliable rumors or leaks, notably different from GPT-4's pre-release information.
Speculation on Model Architecture and Pricing: It's speculated that pricing might correlate with model size or inference compute costs, suggesting Claude 3 Opus could be larger than GPT-4; further, a discussion mentioned a Twitter post hinting at unique architectural features in Claude 3 Opus.
Skepticism on Claims about Model Capabilities: A link to a Twitter post by Daniel Han suggests an uprising of long context models, but a follow-up comment cautions against the optimistic claims due to past inaccuracies.

Eleuther ▷ #research (149 messages🔥🔥): 

Debate on the Nature of Averages in ScheduleFree Optimizer: There is an extensive discussion about the nature of the 1/t term in ScheduleFree optimizer. One faction clarifies that a learning rate beta=1-1/t does not give you an exponential moving average but a simple mean of all values, evidenced by the calculation 1 * (1/2) * (2/3) * (3/4) * ... * (1-1/t) = 1/t.

Exponential vs. Simple Moving Average Misconception: The discussion about the ScheduleFree optimizer continues with explanations provided on the distinction between exponential and simple moving averages, ending with the clarification that ScheduleFree keeps a simple average, not an exponential one.

Exploration of Knowledge Storage Capacity in LLMs: Members shared insights from a research paper, discussing the efficiency of knowledge storage in language models. The paper's findings suggest that language models can store 2 bits of knowledge per parameter, gated MLPs may impair knowledge storage, and MoEs are relatively efficient.

Dense vs. Sparse Training in MoEs: A new paper about Mixture-of-Experts (MoE) models is shared which proposes a dense training and sparse inference framework, claiming better parameter efficiency and comparable performance to dense models, calling into question the parameter efficiency when scaling.

Optimizer Effectiveness in Large-Scale Models: The conversation touches upon the effectiveness of the LAMB optimizer with references to papers that question its benefits. One member suggests a batch size invariant version of Adam for large-scale, distributed settings, which purportedly offers batch size invariance without the strong assumptions needed for LARS and LAMB.

Links mentioned:

Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models: Mixture-of-Experts (MoE) language models can reduce computational costs by 2-4$\times$ compared to dense models without sacrificing performance, making them more efficient in computation-bounded scena...
A Large Batch Optimizer Reality Check: Traditional, Generic...: We retune the Nesterov/Adam optimizers on pipelines where LARS/LAMB are commonly used and achieve similar or better performance, providing competitive baselines for the large batch training setting.
Tweet from Kyo (@kyo_takano): Some data points where ScheduleFree outperforms Adam/SGD: - LM/GPT (@eric_alcaide) https://twitter.com/eric_alcaide/status/1776571679524683950 - CIFAR10/ResNet18 (@Sree_Harsha_N) https://twitter.com/S...
Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws: Scaling laws describe the relationship between the size of language models and their capabilities. Unlike prior studies that evaluate a model's capability via loss or benchmarks, we estimate the n...
Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models: Transformers have catalyzed advancements in computer vision and natural language processing (NLP) fields. However, substantial computational complexity poses limitations for their application in long-...
Batch size invariant Adam: We propose a batch size invariant version of Adam, for use in large-scale, distributed settings, in which the mini-batch is divided into micro-batches which are distributed among worker nodes. For the...
Subtractive Mixture Models via Squaring: Representation and Learning: Mixture models are traditionally represented and learned by adding several distributions as components. Allowing mixtures to subtract probability mass or density can drastically reduce the number...
Comparative Study of Large Language Model Architectures on Frontier: Large language models (LLMs) have garnered significant attention in both the AI community and beyond. Among these, the Generative Pre-trained Transformer (GPT) has emerged as the dominant architecture...
Teach LLMs to Phish: Stealing Private Information from Language Models: When large language models are trained on private data, it can be a significant privacy risk for them to memorize and regurgitate sensitive information. In this work, we propose a new practical data e...
GWYRE: A Resource for Mapping Variants onto Experimental and Modeled Structures of Human Protein Complexes - PubMed: Rapid progress in structural modeling of proteins and their interactions is powered by advances in knowledge-based methodologies along with better understanding of physical principles of protein struc...
GitHub - GraphPKU/PiSSA: Contribute to GraphPKU/PiSSA development by creating an account on GitHub.
UniFL: Improve Stable Diffusion via Unified Feedback Learning: Diffusion models have revolutionized the field of image generation, leading to the proliferation of high-quality models and diverse downstream applications. However, despite these significant advancem...
ByteEdit: Boost, Comply and Accelerate Generative Image Editing: Recent advancements in diffusion-based generative image editing have sparked a profound revolution, reshaping the landscape of image outpainting and inpainting tasks. Despite these strides, the field ...
Aligning Diffusion Models by Optimizing Human Utility: We present Diffusion-KTO, a novel approach for aligning text-to-image diffusion models by formulating the alignment objective as the maximization of expected human utility. Since this objective applie...
How to Scale Hyperparameters as Batch Size Increases: no description found

Eleuther ▷ #lm-thunderdome (13 messages🔥): 

Quick Access to GPQA: GPQA requires users to go to the HF hub, accept terms, and log in via the terminal using their key. The process is described as small and quick to run.
Analyzing High-Temp Runs: Assessing min-P high-temp experiment results qualitatively is recommended to understand the underlying mechanics, with the improvement seen as "real" despite potential lack of statistical significance.
Token Distribution in Sampling: It's proposed that top-P sampling might select too many tokens due to a flat distribution, whereas min-P sampling can filter effectively as the probability ratios change more gradually. Plotting the number of logits picked as seen in the VLLM GitHub repository could shed light on the issue.
Inquiry about Books3 Dataset: There's an interest in obtaining the Books3 dataset, be it through a direct download, a torrent, or extracting it from the Pile, with the latter already downloaded by the inquiring member.
Branch Speed Comparison for Inference: Regarding inference speed, the big-refactor branch is confirmed to be faster compared to the main branch.

Link mentioned: vllm/vllm/model_executor/layers/sampler.py at b4543c8f6bf67a7f1a0d6d0fd6cf5697c7eeaabb · vllm-project/vllm: A high-throughput and memory-efficient inference and serving engine for LLMs - vllm-project/vllm

LangChain AI ▷ #general (111 messages🔥🔥): 

Clarification on LangChain's 'attach_edge' Method: Members discussed how to use the attach_edge method in LangGraph within LangChain lib. Although initial responses indicated no record of attach_edge, it was later clarified that it exists in the CompiledGraph class, and users were advised to check the official documentation for more information.

Exploring AI Transcription Capabilities: A discussion centered around building an application to ask questions to a YouTube video highlighted some confusion between SerpAPI and Serper API, noting that while SerpAPI is documented for LangChain, confirmation on Serper API's compatibility is uncertain.

LLM Selection Experiences Shared: Users exchanged experiences with various LLMs (Large Language Models), discussing the practicality and cost-effectiveness of GPT-3.5, GPT-4, and Claude, with some expressing difficulty with alternate models such as gemin8.

Using LangChain for Data Structuring and Execution: Inquiries into using Pydantic validators within LangChain chains led to clarifications on error handling and retry mechanisms, with links provided to the LangChain API documentation for constructing chains with RunnableRetry.

Retrieval System Use Cases and Evaluation: A user sought advice on creating custom retrieval systems with LangChain and evaluating their performance. Fellow members recommended the trulens eval package and a LangSmith RAG evaluation example.

Comparing LangChain with OpenAI's API for AI Assistants: A member questioned the benefits of using LangChain for AI assistants over OpenAI's APIs, but the responses did not provide a direct comparison or specific advantages.

Links mentioned:

Quickstart | 🦜️🔗 LangChain: Language models output text. But many times you may want to get more
YouTube audio | 🦜️🔗 LangChain: Building chat or QA applications on YouTube videos is a topic of high
Entity | 🦜️🔗 LangChain: Entity memory remembers given facts about specific entities in a conversation. It extracts information on entities (using an LLM) and builds up its knowledge about that entity over time (also using an...
no title found: no description found
SerpAPI Loader | 🦜️🔗 Langchain: This guide shows how to use SerpAPI with LangChain to load web search results.
RAG evaluation with RAGAS | 🦜️🛠️ LangSmith: Ragas is a popular framework that helps you evaluate your Retrieval Augmented Generation (RAG) pipelines.
Synthetic data generation | 🦜️🔗 LangChain: Open In Colab
🦜🕸️LangGraph | 🦜️🔗 LangChain: Downloads
LangGraph | 🦜️🔗 Langchain: ⚡ Building language agents as graphs ⚡
Issues · langchain-ai/langchain: 🦜🔗 Build context-aware reasoning applications. Contribute to langchain-ai/langchain development by creating an account on GitHub.
Issues · langchain-ai/langchain: 🦜🔗 Build context-aware reasoning applications. Contribute to langchain-ai/langchain development by creating an account on GitHub.
LangGraph | 🦜️🔗 Langchain: ⚡ Building language agents as graphs ⚡
GitHub - ggerganov/whisper.cpp: Port of OpenAI's Whisper model in C/C++: Port of OpenAI's Whisper model in C/C++. Contribute to ggerganov/whisper.cpp development by creating an account on GitHub.
Issues · langchain-ai/langchain: 🦜🔗 Build context-aware reasoning applications. Contribute to langchain-ai/langchain development by creating an account on GitHub.
Quickstart | 🦜️🔗 LangChain: In this quickstart we'll show you how to:

LangChain AI ▷ #share-your-work (5 messages): 

Artful AI Enhances Creativity: New updates to Artful AI introduce new models: Dalle Creative, Anime Dream, & Epic Realism with bug fixes for a better user experience. Artful AI is an image generator app powered by AI models like Dalle-3 and SDXL among others; visit the updated app here.

AISploit Aids Penetration Testers: A tiny package named AISploit is designed to support red teams and penetration testers in exploiting large language model AI solutions. More details can be found on the GitHub repository.

Galaxy AI Offers Free AI Models Access: Galaxy AI provides free API service for premium AI models including GPT-4, GPT-3.5-turbo-1106, and more. Integration is supported by Langchain and all APIs are available in OpenAI format; discover it here.

Automate Your Tinder Typing with TinderGPT: Presented is TinderGPT, an automation app for dating conversations, designed to save time and secure matches. Interested users can check out the GitHub project.

Customizable Local Chatbot Assistant 'everything-rag' Introduced: everything-rag is a new tool for a fully customizable local LLM, inspired by Jan.ai and Cheshire Cat AI, which works with any PDF and features 100% local, free functionality. Explore the HuggingFace space, check out the GitHub repo, and read the related blog post.

Links mentioned:

Galaxy AI - Swagger UI: no description found
Artful - AI Art Generator - Apps on Google Play: no description found
everything-rag - a Hugging Face Space by as-cle-bert: no description found
GitHub - hupe1980/aisploit: 🤖🛡️🔍🔒🔑 Tiny package designed to support red teams and penetration testers in exploiting large language model AI solutions.: 🤖🛡️🔍🔒🔑 Tiny package designed to support red teams and penetration testers in exploiting large language model AI solutions. - hupe1980/aisploit
GitHub - GregorD1A1/TinderGPT: Contribute to GregorD1A1/TinderGPT development by creating an account on GitHub.

LangChain AI ▷ #tutorials (2 messages): 

Fashion Forward AI Creates Stylish Images: A member has built an AI agent capable of creating images of people wearing any chosen clothing and generating social media posts. The process is demonstrated in a YouTube video titled "Build a real AI model that can try any cloth," which can be watched here.

Publishing an AI Agent with a UI: A member inquires about the steps involved in publishing a developed AI agent, including how to develop a user interface. They are seeking a tutorial or guidance on this process.

Link mentioned: Build a real AI model that can try any cloth: I built an agent system which will autonomously iterate & generate img of AI model wearing certain cloth and produce millions+ social postsFree access to run...

CUDA MODE ▷ #general (4 messages): 

Meta's Massive GPU Sponsorship: Meta sponsored a significant study on LLM knowledge capacity which involved 4.2 million GPU hours. Researchers took four months to submit 50,000 jobs, with Meta's legal review taking an additional month.

Half a Millennium in Compute Hours: A quick calculation reveals that Meta's 4.2 million GPU hours equate to roughly 479 years of continuous compute, highlighting the extensive resources dedicated to LLM research.

GPT-2 Goes CUDA: A member mentioned porting GPT-2 training code to CUDA could be an exciting benchmark project, signaling potential advancements in efficiency and performance, and shared the relevant GitHub repository.

Creating a Working Group for CUDA Enthusiasts: Following an expression of interest in CUDA porting projects, a working group has been proposed to bring together like-minded individuals from the community.

Links mentioned:

Tweet from Zeyuan Allen-Zhu (@ZeyuanAllenZhu): Our 12 scaling laws (for LLM knowledge capacity) are out: https://arxiv.org/abs/2404.05405. Took me 4mos to submit 50,000 jobs; took Meta 1mo for legal review; FAIR sponsored 4,200,000 GPU hrs. Hope t...
llm.c/dev/cuda at master · karpathy/llm.c: LLM training in simple, raw C/CUDA. Contribute to karpathy/llm.c development by creating an account on GitHub.

CUDA MODE ▷ #triton (1 messages): 
mobicham: Still using dot product instead of additions 🤔

CUDA MODE ▷ #cuda (3 messages): 

Training LLMs in C: A link to Andrej Karpathy's Tweet was shared, introducing llm.c, a lean implementation of GPT-2 training in pure C with only 1,000 lines of code, which is a match for the PyTorch reference implementation. Karpathy has selected GPT-2 as the foundational LLM for its historical significance and available weights.

C to CUDA Conversion Interest: Members expressed enthusiasm for porting the newly shared C-based LLM training code to CUDA, leveraging the compactness of llm.c. One member contemplated integrating the code into their library and sought clarification about license compatibility, specifically between MIT and Apache 2.0.

Links mentioned:

Tweet from Andrej Karpathy (@karpathy): Have you ever wanted to train LLMs in pure C without 245MB of PyTorch and 107MB of cPython? No? Well now you can! With llm.c: https://github.com/karpathy/llm.c  To start, implements GPT-2 training on ...
Tweet from Andrej Karpathy (@karpathy): Have you ever wanted to train LLMs in pure C without 245MB of PyTorch and 107MB of cPython? No? Well now you can! With llm.c: https://github.com/karpathy/llm.c  To start, implements GPT-2 training on ...

CUDA MODE ▷ #torch (2 messages): 

Fusing Operations for Speed: Discussing optimization, a member mentioned the potential of operation fusing to speed up processes, especially matrices operations. However, they noted challenges in outperforming library matmul implementations but suggested checking out tiny-cuda-nn and cutlass for possible performance gains.

Matrix Multiplication Performance Puzzle: An intriguing mathematical challenge was shared regarding the performance of matrix multiplication with different shapes, highlighting that configuration A: M=2047, K=N=2048 would have the best performance. This stems from the importance of understanding tiling and memory layouts, as explained in detail on Thonking.ai.

Links mentioned:

Answer Key: What Shapes Do Matrix Multiplications Like?: Companion to https://www.thonking.ai/p/what-shapes-do-matrix-multiplications
tiny-cuda-nn/DOCUMENTATION.md at master · NVlabs/tiny-cuda-nn: Lightning fast C++/CUDA neural network framework. Contribute to NVlabs/tiny-cuda-nn development by creating an account on GitHub.

CUDA MODE ▷ #announcements (1 messages): 

CUDA MODE Discord Reaches New Heights: The community celebrates surpassing 5,000 members, emphasizing this as one of the favorite online spaces for the hosts. The channel's growth is credited to the active participation and enthusiasm of its members.

Steady Stream of Learning: The channel has successfully maintained a pace of releasing one lecture per week since its inception, accumulating a total of 13 lectures so far. These educational sessions are accessible in the lectures channel.

From Learning to Application: Members are actively applying the insights gained from the lectures to build kernels in the real world, demonstrated by the engagement across various active working groups. The practical application of knowledge underscores the channel's impact.

Invitation to Performance Enthusiasts: Members with friends who have a keen interest in performance optimization are encouraged to extend an invitation to join the CUDA MODE community by sharing the Discord invite link: discord.gg/cudamode. The message emphasizes the community's openness and focus on performance-minded individuals.

CUDA MODE ▷ #ring-attention (9 messages🔥): 

The Memory-Space Trade-off in Ring Architectures: A member questions the viability of ring attention architectures, suggesting that while speed gains are achieved from distributed computing, they seem to come at the cost of increased memory requirement due to the need for buffering results in message passing between devices.

Introducing LIAH for Avoiding Pre-Knowledge: A member shared their implementation named LIAH (lie-in-a-haystack), which aims to insert a 'lie' into the context to prevent a language model from answering based on its own knowledge. The GitHub repository for LIAH has been shared: LIAH on GitHub.

Clarifying Ring Attention's Communication Step: One member posted a query about a specific 'if' condition in the ring attention implementation, referencing the code on GitHub. It was clarified that in causal attention, future tokens are masked, and therefore, computation for them isn't needed.

Educational Flash Attention in the Works: Members are collaborating on developing educational flash attention examples with a live coding session recorded. The first parts of their efforts can be found on GitHub.

Testing NiH with Different Model Types: Discussion continues on needle in a haystack (NiH) implementations with curiosity about testing LIAH on state space models like a mamba model to evaluate their efficiency at this task.

Links mentioned:

ring-attention/naive_flash_attn at naive_flash_attn_examples · cuda-mode/ring-attention: ring-attention experiments. Contribute to cuda-mode/ring-attention development by creating an account on GitHub.
ring-flash-attention/ring_flash_attn/ring_flash_attn.py at 55ff66fd35f329dfcc24ce7a448bfdd532865966 · zhuzilin/ring-flash-attention: Ring attention implementation with flash attention - zhuzilin/ring-flash-attention
GitHub - melvinebenezer/Liah-Lie_in_a_haystack: needle in a haystack for LLMs: needle in a haystack for LLMs. Contribute to melvinebenezer/Liah-Lie_in_a_haystack development by creating an account on GitHub.

CUDA MODE ▷ #off-topic (2 messages): 

Naming Woes in GPU Technology: A member questions the suitability of the term "kernels" for GPU kernels, suggesting it might not be the best fit for this technology.
Hatsune Miku Fans Outraged Over Flatscreen Performance: Hatsune Miku enthusiasts express frustration as the virtual idol's live show used a flatscreen instead of her signature hologram technology, contrasting sharply with past performances and expectations set by prior events like the 2Pac hologram at Coachella 2012.

Link mentioned: Hatsune Miku Fans Furious Live Show Was Just a Flatscreen On Stage: The virtual pop idol did not appear in her full hologram form in two shows on her North American tour and fans are pissed.

CUDA MODE ▷ #triton-puzzles (4 messages): 

Triton Puzzles Get Official References: Triton puzzles will now be officially included in the documentation as per the recent GitHub pull request #3608.

Clarification on Puzzle 11: There was a precision made regarding puzzle 11, pointing out a necessary correction: "the summation should be over the shared index $l$."

Approval of Triton Documentation Updates: Member acknowledges and expresses positive sentiment towards the official referencing of Triton puzzles in the documentation.

Link mentioned: Add additional tips and links to README. by jlebar · Pull Request #3608 · openai/triton: no description found

CUDA MODE ▷ #hqq (46 messages🔥): 

Exploring Low-Precision Strategies with Mobicham: Mobicham shared ongoing work on 4-bit quantization and simple CUDA kernels overusing Torch Compile for performance gains, adapting to work with Marlin kernels and quantizing with HQQ. There are speed-up efforts against pytorch models and placeholder codes for inference shared here.

Quantization Challenges in LLM: There was a discussion on issues with low-precision quantization for large language models (LLMs), specifically with dynamic activation quantization and int8 weight quantization. This included mentioning the interest in HQQ's incorporation for mobile deployment where quantizing activations is mandatory.

Testing Marlin's Performance: Members are testing Marlin's kernel, reporting on its performance and accuracy results. Performance gains seem underwhelming compared to claims, and some accuracy issues are noted, with a complete benchmark being planned to assess the impact.

Tackling Wikitext Evaluation Issues: Members faced difficulties when running evaluation on wikitext, with discussions around correct scripts, perplexity results, and unexpected token errors in embeddings. Repositories and branches were shared for troubleshooting, and improvements were sought to match expected performance metrics.

Seeking Harmony in Quantized Models: Users exchanged technical details regarding the implementation of HQQLinear without conversion, the appropriate setting of quant_scale and quant_zero within the quantization settings, and troubleshooting for quirks in the wikitext evaluation. The conversation included sharing GitHub repositories for scripts and evaluation strategies, such as the direct application of HQQ techniques to avoid accruing errors during quantization.

Links mentioned:

hqq/hqq/core/quantize.py at master · mobiusml/hqq: Official implementation of Half-Quadratic Quantization (HQQ) - mobiusml/hqq
GitHub - IST-DASLab/marlin: FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.: FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens. - IST-DASLab/marlin
GitHub - zhxchen17/gpt-fast: Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.: Simple and efficient pytorch-native transformer text generation in <1000 LOC of python. - zhxchen17/gpt-fast
gpt-fast/model.py at main · pytorch-labs/gpt-fast: Simple and efficient pytorch-native transformer text generation in <1000 LOC of python. - pytorch-labs/gpt-fast
eval_model_wikitext_gptfast.py: GitHub Gist: instantly share code, notes, and snippets.
testing HQQ [not for land] by HDCharles · Pull Request #155 · pytorch-labs/gpt-fast: Stack from ghstack (oldest at bottom):  -> #155  Summary: hqq wikitext: {'word_perplexity,none': 12.698986130023261, 'word_perplexity_stderr,none': 'N/A', 'byte_perplexi...
HQQ 4 bit llama 2 7b · zhxchen17/gpt-fast@f7c8151: export MODEL_REPO=meta-llama/Llama-2-7b-hf scripts/prepare.sh $MODEL_REPO python quantize.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --mode int4-hqq --groupsize 64 python generate.py --...

CUDA MODE ▷ #triton-viz (20 messages🔥): 

Exploring triton-viz with Loops and Control Flows: A discussion was held about applying triton-viz to programs containing loops and control flows, raising concerns about the intuitiveness of the current visualization for these structures.
Visualization Tools for triton-viz: The conversation included a decision to use ipycanvas and ipyevents, with a preference for a setup richer than Gradio but simpler than Pygame, that can run in Jupyter notebooks.
Variable Name Labeling in Visualizations: It was suggested to programmatically retrieve variable names for tensor labels in triton-viz, although it was noted that variable names might not always be available for generated masks from operations like a < b.
Brainstorming Visualizations for Conditional Statements: Brainstorming began for how to visualize for and if statements in triton-viz, with an aim to make clearer tutorials or potentially using JavaScript for better interactivity and fast animations.
Seeking Help with Matmul Puzzles: A member inquired about an answer key for matmul puzzles, indicating difficulty in solving them, and shared a janky visual with real values under load/store operations, seeking opinions on the approach.

CUDA MODE ▷ #llmdotc (13 messages🔥): 

CUDA Struggles with Efficient Attention: A member mentioned they are currently struggling with efficient attention in CUDA, which is the last part required to complete the forward pass.
Initial GPU Optimization Achieved: The same member clarified that when they said the kernels were "mostly in good shape," it meant they had done one round of parallelizing improvements over the naive copy-paste of the CPU code.
llm.c repository shared: A member shared the GitHub repository llm.c, highlighting it as a valuable resource for learning and experimenting with CUDA.
OpenMP in llm.c for Easy GPU Offloading: Discussions around the use of OpenMP in llm.c led to a suggestion that switching from CPU to GPU execution might be as simple as changing one line of code to enable GPU offloading.
Cross-Vendor GPU Compatibility and Efficiency: The potential advantages of using OpenMP for GPU offloading were discussed, including easier code and cross-GPU vendor compatibility, but there was uncertainty regarding support under Windows.

Link mentioned: GitHub - karpathy/llm.c: LLM training in simple, raw C/CUDA: LLM training in simple, raw C/CUDA. Contribute to karpathy/llm.c development by creating an account on GitHub.

OpenInterpreter ▷ #general (50 messages🔥): 

New to the AI World: A member expressed excitement about starting the journey in coding and using the OpenInterpreter device.
DIY or Preorder: Members discussed the option to either preorder the official OpenInterpreter hardware device or build it themselves, which would include purchasing components like the M5 Atom Echo and potentially soldering for a DIY version.
Purchase Points and Software Installation: A participant was informed that they could buy parts such as the M5 Atom Echo from Mouser.com and that the custom software powering the device is currently optimized for the M5.
Anticipated Reliability Improvements: The conversation touched on the importance of improving the OpenInterpreter’s core repository to enhance reliability and ensure it can handle a wider array of tasks.
GPT-4 Excitement: There was a buzz around the new GPT-4 release, with members noting its improved speed, integrated vision capabilities, and its presence on the OpenAI Platform and documentation at OpenAI's model updates.

OpenInterpreter ▷ #O1 (41 messages🔥): 

Python Version Compatibility Resolved: A member faced an issue with their bot not recognizing voice, which worked after switching from Python 3.11.4 to 3.10. Other members affirmed that Python 3.9 and 3.10 are currently supported with some acknowledging wonkiness with 3.10.
Troubleshooting pyaudio on M1 Mac: Individuals share technical solutions for pyaudio issues on M1 Macs, with suggestions to uninstall pyaudio, reinstall portaudio, and even using different Python versions such as 3.11.
First-time Excitement with 01: Members shared the excitement of getting their 01 bot to work, discussing whether to use local models or those provided by OpenAI, like gpt-4, despite the cost.
Windows Installation Woes of 01: A member is struggling to install 01 on Windows, outlining detailed steps they’ve taken but facing issues with OpenAPI key recognition. They were advised to use OPENAI_API_KEY instead of open_api_key and shared deployment steps and troubleshooting attempts.
Raspberry Pi and Desk Robot Endeavors: Community members discussed using the Raspberry Pi for building 01 and potential use cases, such as desk robots. A member expressed an intention to create something open source and desk bot-related with a domain they own, cofounder.bot.

Links mentioned:

GitHub - OpenInterpreter/01: The open-source language model computer: The open-source language model computer. Contribute to OpenInterpreter/01 development by creating an account on GitHub.
GitHub - OpenInterpreter/01: The open-source language model computer: The open-source language model computer. Contribute to OpenInterpreter/01 development by creating an account on GitHub.

OpenAccess AI Collective (axolotl) ▷ #general (77 messages🔥🔥): 

Jet MoE Pending Integration: There was discussion about Jet MoE; it has not been merged into Hugging Face's transformers yet, but there's anticipation for its addition. A link to the relevant GitHub pull request was shared, showing the process in progress.

Lepton AI's Simple Cloud-Native Platform: One member highlighted Lepton AI as a user-friendly cloud-native platform for running AI applications, with a focus on simplicity and starting with minimal code. A link was provided, showcasing tools like Photon, WhisperX, Mixtral 8x7b, and Stable Diffusion XL.

Model Comparison Queries: Conversations included requests for comparisons between different models, specifically Qwen 1.5 32B vs Yi 34B vs Command R on the same fine-tuning dataset, noting that Yi's extended context is a tough benchmark to beat.

Meta's Llama 3 Anticipation Builds: There was buzzing anticipation for Meta's Llama 3, with members discussing its multimodal capabilities, a possible release according to The Information, and speculations about the number of parameters it would contain. There were also critiques about a reporter's narrative on open-source projects in relation to established AI companies.

Exploring Non-English Performance on LLMs: There were comments on the potential for improved non-English performance in LLMs, with mentions of an upcoming launch of smaller Llama 3 models by Meta and the gemma tokenizer, indicating its untrained state on non-English tokens. Concerns and curiosity about what's considered "small" in 2024 were also discussed.

Links mentioned:

Build AI The Simple Way | Lepton AI: Run AI applications efficiently, at scale, and in minutes with a cloud native platform.
Add JetMoE model by yikangshen · Pull Request #30005 · huggingface/transformers: What does this PR do? Add support to JetMoE architecture by Yikang Shen and MyShell AI. JetMoE is a new sparsely activated architecture inspired by the ModuleFormer. Each JetMoE block consists of t...

OpenAccess AI Collective (axolotl) ▷ #axolotl-dev (7 messages): 

Anticipating Transformers PR for DreamGen Inquiry: While discussing DreamGen, a member mentioned they are awaiting a transformers pull request (PR).

Dataset Versioning Could Arrive to Axolotl: Dataset versioning support is currently missing in Axolotl. A member expressed interest in contributing a PR for this feature after confirmation that it has not been requested before.

Initializing LoRA Layers with SVD Enhances Fine-Tuning: Sharing a research update from CFGeek, a member highlighted a finding where initializing LoRA layers with SVD of original weight matrices improves fine-tuning results. The innovative approach and related materials are detailed in the PiSSA GitHub repo and the accompanying paper on arXiv.

Link mentioned: Tweet from Charles Foster (@CFGeek): YES! If you initialize a LoRA layer based on the SVD of the original weight matrix (with its top singular values & vectors), you get significantly better fine-tuning results.  This is a straight-up fr...

OpenAccess AI Collective (axolotl) ▷ #general-help (5 messages): 

Inquiring About Continuous Pre-training Practices: A member shared interest in improving Norwegian grammar capabilities of an LLM through continuous pre-training with a dataset of high-quality articles. They asked for dataset formatting guidance, mentioning the possibility of separating articles with \n\n.

Tips on Dataset Splitting: Another member suggested handling article splitting by either placing one article per line or using JSONL format (JSON lines), with each article being a separate entry in the dataset.

Seeking Dataset for Axolotl Fine-tuning: A query was made regarding the availability of a dataset suited for JSON mode or function calling, specifically for fine-tuning LoRAs with the axolotl framework. No specific datasets were recommended in the subsequent messages.

OpenAccess AI Collective (axolotl) ▷ #datasets (1 messages): 
blackl1ght: Does anyone have a good function-calling or JSON mode dataset?

OpenRouter (Alex Atallah) ▷ #announcements (1 messages): 

Gemini Pro 1.5 Launches with GPT-4 Turbo: OpenRouter announces the arrival of Gemini Pro 1.5, boasting a massive 1M token context, and OpenAI's new GPT-4 Turbo with vision capabilities.
Expanded logit_bias Support: Additional models now support the logit_bias parameter, allowing for enhanced control over model outputs by adjusting token likelihoods, including models like Nous-Hermes-2-Mixtral and various versions of Llama.
Model Discontinuations Announced: OpenRouter is discontinuing underutilized models such as jebcarter/Psyfighter-13B and jondurbin/bagel-34b-v0.2, which will remain accessible for a two-week grace period, and migtissera/synthia-70b, with traffic being redirected to xwin-lm/xwin-lm-70b from April 15th.

Links mentioned:

Gemini Pro 1.0 by google | OpenRouter: Google's flagship text generation model. Designed to handle natural language tasks, multiturn text and code chat, and code generation.  See the benchmarks and prompting guidelines from [Deepmind]...
GPT-4 Turbo by openai | OpenRouter: The latest GPT-4 Turbo model with vision capabilities. Vision requests can now use JSON mode and function calling. Training data: up to Dec 2023.  This model is updated by OpenAI to point to the lates...
Hermes 2 Mixtral 8x7B DPO by nousresearch | OpenRouter: Nous Hermes 2 Mixtral 8x7B DPO is the new flagship Nous Research model trained over the [Mixtral 8x7B MoE LLM](/models/mistralai/mixtral-8x7b).  The model was trained on over 1,000,000 entries of prim...
Mistral 7B Instruct by mistralai | OpenRouter: A 7.3B parameter model that outperforms Llama 2 13B on all benchmarks, with optimizations for speed and context length.  This is v0.1 of Mistral 7B Instruct. For v0.2, use [this model](/models/mistral...
Llama v2 13B Chat by meta-llama | OpenRouter: A 13 billion parameter language model from Meta, fine tuned for chat completions
Llama v2 70B Chat by meta-llama | OpenRouter: The flagship, 70 billion parameter language model from Meta, fine tuned for chat completions. Llama 2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned ve...
Mixtral 8x7B by mistralai | OpenRouter: A pretrained generative Sparse Mixture of Experts, by Mistral AI. Incorporates 8 experts (feed-forward networks) for a total of 47B parameters. Base model (not fine-tuned for instructions) - see [Mixt...

OpenRouter (Alex Atallah) ▷ #app-showcase (2 messages): 

Telegram Bot Roasts and Summarizes: The Syrax AI bot has been released for Telegram, boasting features like roasting group members and summarizing chat history up to 1,000 messages, all powered by OpenRouter.
Fight Spam with Syrax: In addition to entertainment, this bot maintains a global blacklist to prevent spam and other malicious activities in Telegram groups.
Open for Feedback: The developer has invited users to provide feedback on the Syrax AI bot to further refine its capabilities and performance.

Links mentioned:

Syrax AI - Leverage multiple AIs on one platform: With Syrax AI you can access multiple AI models to generate content, images, and more from one platform.
Syrax AI Bot: With Syrax AI you can access multiple AI models to generate content, images, and more from one platform. t.me/SyraxAI

OpenRouter (Alex Atallah) ▷ #general (86 messages🔥🔥): 

Gemini Pro 1.5 Draws Mixed Reactions: Some users anticipate that Gemini Pro 1.5 might be underappreciated, stating it could be significant to the LLM community. However, others have tested it and felt that it underperforms in tasks like exporting data from PDF to JSON, especially compared to Claude3-Opus.

Concerns Over Censorship in AI Models: The discussion on censorship reflects a desire for less restrictive AI models. Users share experiences of being filtered and express concern about the increase in content moderation, predominantly in models used for role-playing (RP).

Exploring Frontend Options for AI Models: Solutions like Jan and LibreChat are suggested for those looking for ChatGPT-like interfaces compatible with OpenRouter. Meanwhile, SillyTavern is mentioned as an alternative for chat and RP uses.

Command-R v. Command-R+ for Role-Playing: In the context of role-playing, Command-R is preferred by users for quality responses over its advanced counterpart, yet Command-R+ is still considered strong for non-RP tasks.

Technical Issues and Quotas with Gemini Pro 1.5: Users report technical difficulties with Gemini Pro 1.5, including error codes and API quota limits. One member encountered error code 429 indicating an exceeded request quota, which seems to resolve on its own, suggesting a per-minute request limit.

Links mentioned:

LibreChat: Enhanced ChatGPT Clone, featuring OpenAI, Azure, Mistral, Anthropic, Google, Ollama, DALL-E-3 models and more. An Open-source, versatile Web UI, with seamless self-hosting and active developments.
Jan - OpenRouter: A step-by-step guide on how to integrate Jan with OpenRouter.
Google Cloud Gemini, Image 2, and MLOps updates | Google Cloud Blog: Vertex AI adds expanded Gemini 1.5 access, the new CodeGemma model, enhancements to Imagen, and new MLOps features.
Welcome to Google Cloud Next ‘24 | Google Cloud Blog: Google Cloud CEO Thomas Kurian provides an overview of all the news and customer momentum from Google Cloud Next ‘24.

Latent Space ▷ #ai-general-chat (88 messages🔥🔥): 

Groq's Surprising Origins and Potential: Groq's founder started the TPU project at Google as a side project with no ML used, and now Groq reportedly has an impressive 75k developers, offers 1/10 inference cost, and is set to reach an inference capacity rivaling Meta. An anecdote shared notes NVIDIA engineers are allegedly embarrassed by the H200 performance, indicating Groq's edge in the market.
Gemini 1.1 Release Announced: The release of Gemini 1.1 was highlighted with a link to the announcement on Twitter.
GPT-4 Turbo Released: GPT-4 Turbo, boasting a larger 128k context window and more modern knowledge up to December 2023, was introduced with updated pricing. The update was widely shared and included a link to OpenAI's pricing page.
GPT-4 Implementation in C: Andrej Karpathy has created llm.c, a lean implementation of GPT-2 training in C, promised to be in ~1,000 lines of clean code. The repository and additional C tutorials were linked (llm.c on GitHub) along with discussions about its potential uses and challenges for future LLMs.
OpenAI's GPT-4 and Google's Gemini 1.5 Pro: Discussion about the rapid updates from OpenAI and Google with the new GPT-4 Turbo and Gemini 1.5 Pro that understands audio, showcases advancements with AI agents taking prompt-based actions as well as accessibility improvements like removing waitlists and offering free tiers.

Links mentioned:

Tweet from Zeyuan Allen-Zhu (@ZeyuanAllenZhu): Our 12 scaling laws (for LLM knowledge capacity) are out: https://arxiv.org/abs/2404.05405. Took me 4mos to submit 50,000 jobs; took Meta 1mo for legal review; FAIR sponsored 4,200,000 GPU hrs. Hope t...
Tweet from Abhik Roychoudhury (@AbhikRoychoudh1): Introducing  AutoCodeRover Presenting our autonomous software engineer from Singapore ! Takes in a Github issue (bug fixing or feature addition), resolves in few minutes, with minimal LLM cost ~$0.5 !...
Tweet from Ryan Moulton (@moultano): The way Nigerian twitter is blowing up at this makes me think a lot of ChatGPTisms are just colloquial language for the workforce they hired to write fine tuning data.  ↘️ Quoting Paul Graham (@paulg)...
Tweet from Kyle Corbitt (@corbtt): If you want to try out the new Llama 3 models when they drop next week the best way to do so is to get your dataset uploaded and ready to go on @OpenPipeAI. We will have fine-tuning and inference live...
turbopuffer: turbopuffer is a vector database built on top of object storage, which means 10x-100x cheaper, usage-based pricing, and massive scalability
pgvector: Embeddings and vector similarity | Supabase Docs: pgvector: a PostgreSQL extension for storing embeddings and performing vector similarity search.
Innovating Data Centers with Moore's Law | 48sec snip from ChinaTalk: 48sec snip from A Gut Check on Intel and Nvidia with Asianometry, Fabricated Knowledge, and SemiAnalysis | ChinaTalk
LMSys Chatbot Arena Leaderboard - a Hugging Face Space by lmsys: no description found
RSVP to Realtime Voice AI and Multimodal Hackathon | Partiful: Hi fellow lovely hackers,  The AI Engineer Foundation (Your Friendly Open Source Nonprofit Neighbor - website: aie.foundation) is hosting a Real Time Interactive/Conversational Multimodal AI hackathon...
How to talk to an LLM (with your voice): Code for building real-time AI WebRTC applications
Tweet from kwindla (@kwindla): @latentspacepod Here's a video from @chadbailey59 showing the possibilities of fast voice response + tool calling.
Semantic-Router - Qdrant: Qdrant is an Open-Source Vector Database and Vector Search Engine written in Rust. It provides fast and scalable vector similarity search service with convenient API.
Tweet from Andrej Karpathy (@karpathy): I added a quick crappy tutorial on how PyTorch layers are moved to C, with a few possibly helpful pointers: https://github.com/karpathy/llm.c/blob/master/doc/layernorm/layernorm.md
Pricing: Simple and flexible. Only pay for what you use.
Tweet from Liam Bolling (@liambolling): 🎉 It’s a big day for @Google Gemini.   Gemini 1.5 Pro now understands audio, uses unlimited files, acts on your commands, and lets devs build incredible things with JSON mode! It’s all 🆓. Here’s why...
Tweet from Andrej Karpathy (@karpathy): Have you ever wanted to train LLMs in pure C without 245MB of PyTorch and 107MB of cPython? No? Well now you can! With llm.c: https://github.com/karpathy/llm.c  To start, implements GPT-2 training on ...
Tweet from Andrej Karpathy (@karpathy): Have you ever wanted to train LLMs in pure C without 245MB of PyTorch and 107MB of cPython? No? Well now you can! With llm.c: https://github.com/karpathy/llm.c  To start, implements GPT-2 training on ...
Tweet from Andrej Karpathy (@karpathy): Btw writing the llm.c training code would imo be a very interesting, impressive, self-contained and very meta challenge for LLM agents. The prompt is:  Take the PyTorch code train_gpt2.py And write, c...
Tweet from Liam Bolling (@liambolling): 🎉 It’s a big day for @Google Gemini.   Gemini 1.5 Pro now understands audio, uses unlimited files, acts on your commands, and lets devs build incredible things with JSON mode! It’s all 🆓. Here’s why...
AI-powered Voice Patient Intake: See how AI-powered patient intake is streamlining the intake process and improving data data accuracy prior to clinical encounters. With voice patient intake...

Modular (Mojo 🔥) ▷ #general (16 messages🔥): 

F-strings in Mojo Still Pending: Members have confirmed that f string functionality is not supported in Mojo yet. The feature is awaited by users looking for Python-like string formatting capabilities.
Exploration for Local Documentation Command: A user inquired about a command to download Mojo documentation locally, similar to Rust's documentation command, but no such command exists at the moment. They were advised to use the online documentation or clone the Git repository for now.
Quality of Local vs. Online Documentation: There's a discussion on whether the local Git repository documentation is as structured and up-to-date as the online documentation. Links to online Mojo standard library modules were shared for users preferring structured resources.
Temporary Solution for String Formatting: While waiting for the f string functionality, a user suggested using C-style formatting [from builtin.io import _printf as printf] for string formatting in Mojo. However, this method may be deprecated in the future.
Mojo API Documentation for Beginners: A user shared a Notion site link with translated and summarized API documentation intended to help beginners.

Links mentioned:

Notion – The all-in-one workspace for your notes, tasks, wikis, and databases.: A new tool that blends your everyday work apps into one. It's the all-in-one workspace for you and your team
Mojo🔥 modules | Modular Docs: A list of all modules in the Mojo standard library.

Modular (Mojo 🔥) ▷ #💬︱twitter (2 messages): 

Modular Shares Fresh Update: Modular posted a new update on Twitter, which can be viewed by following this link: Modular Twitter Update.
Another Tweet from Modular: A subsequent tweet from Modular was shared in the Discord channel. Click to see the full content: Modular's Latest Tweet.

Modular (Mojo 🔥) ▷ #✍︱blog (1 messages): 

Contribute to Mojo's Evolution: Mojo's significant milestone was the open sourcing of its standard library with an announcement of the same. The step-by-step guide provided offers a comprehensive walk-through on contributing to Mojo, detailing everything from initial setup to pull request creation.
Mojo Guide to Community Contributions: The guide outlines how the community can partake in enhancing Mojo, starting from identifying GitHub issues to code contributions. Contributors are encouraged to also consult the contribution guide for more in-depth instructions.

Link mentioned: Modular: How to Contribute to Mojo Standard Library: A Step-by-Step Guide: We are building a next-generation AI developer platform for the world. Check out our latest post: How to Contribute to Mojo Standard Library: A Step-by-Step Guide

Modular (Mojo 🔥) ▷ #ai (1 messages): 

Karpathy's New GitHub Sesame: AI guru Andrej Karpathy has released a GitHub repository, delivering the codebase to train GPT-2 style models with just 1000 lines of pure C. This offers a streamlined approach, focusing on the essentials of LLM training in raw C/CUDA.

Link mentioned: GitHub - karpathy/llm.c: LLM training in simple, raw C/CUDA: LLM training in simple, raw C/CUDA. Contribute to karpathy/llm.c development by creating an account on GitHub.

Modular (Mojo 🔥) ▷ #🔥mojo (33 messages🔥): 

Django Performance with Mojo: A member expressed excitement at the prospect of using Django with Mojo without the CPython cost, speculating that parts of Django might be compiled or rewritten in Mojo for increased performance.
Sets and Collections in Mojo: A user brought up an issue about Set not conforming to the CollectionElement trait in Mojo due to the absence of __copyinit__ and __del__, which posed a challenge in creating dictionaries of Sets.
RustPython Insight: Members discussed the RustPython project, recognizing the huge effort of reimplementing Python's stdlib and mentioning its current slow performance compared to CPython.
Mojo Concurrency Primitives: Discussion around Mojo's coroutine and async/await highlighted that they are present but unfinished; async for and async with are not yet implemented as per the Mojo documentation and roadmap.
Compiler Bug Report for Mojo: A user encountered what seemed to be a compiler bug with async function pointers in Mojo, leading to a discussion and the filing of a bug report on GitHub.

Links mentioned:

coroutine | Modular Docs: Implements classes and methods for coroutines.
Mojo🔥 roadmap & sharp edges | Modular Docs: A summary of our Mojo plans, including upcoming features and things we need to fix.
GitHub - karpathy/llm.c: LLM training in simple, raw C/CUDA: LLM training in simple, raw C/CUDA. Contribute to karpathy/llm.c development by creating an account on GitHub.
GitHub - modularml/mojo: The Mojo Programming Language: The Mojo Programming Language. Contribute to modularml/mojo development by creating an account on GitHub.
GitHub - RustPython/RustPython: A Python Interpreter written in Rust: A Python Interpreter written in Rust. Contribute to RustPython/RustPython development by creating an account on GitHub.
GitHub - dorjeduck/llm.mojo: port of Andrjey Karpathy's llm.c to Mojo: port of Andrjey Karpathy's llm.c to Mojo. Contribute to dorjeduck/llm.mojo development by creating an account on GitHub.
[BUG] Compiler bug when typing async function pointer call return type · Issue #2252 · modularml/mojo: Bug description The mojo compiler incorrectly types the result of calling an async function pointer. Expected behavior async fn() -> Int functions return a Coroutine[Int] type when called. This Cor...

Modular (Mojo 🔥) ▷ #community-projects (2 messages): 

Collaborative Enthusiasm for Feature Development: A member expressed interest in contributing to new feature development for an upcoming release after examining the repository. They inquired about how to engage in a more detailed discussion, hinting at raising an issue in the repository for further communication.

Direct Response for Coordination: Another participant directly responded with a supportive message and initiated a private conversation to presumably discuss the collaboration further.

Modular (Mojo 🔥) ▷ #community-blogs-vids (1 messages): 

Python Star Pattern Tutorial with a Twist: A community member shared a YouTube video titled "How to Print Any Star Pattern in Python" that slyly introduces viewers to Mojo by disguising it as a simple Python tutorial. The video reveals that the code is written with the Mojo plugin in VSCode, surprising many who weren't aware of Mojo's Python capabilities. Enjoy the educational yet playful troll right here.

Link mentioned: How to Print Any Star Pattern in Python: If you want to learn more about Python, Mojo, or even modern software development with Scrum, sign up for my newsletter. You won't regret it!https://www.xenn...

Modular (Mojo 🔥) ▷ #performance-and-benchmarks (2 messages): 

Request for SYRK Implementation: A member is inquiring about a SYRK (symmetric rank-k update) implementation in Mojo for performance testing. No further context or details provided.

Modular (Mojo 🔥) ▷ #nightly (20 messages🔥): 

Flames of Progress: The Mojo nightly build has been released, and users can update using modular update nightly/mojo. The changes not yet in stable Mojo are listed on their changelog, and the difference from the last release can be checked here.
Aiming for Nightly Weekday Releases: Currently, there is no fixed release schedule for Mojo Nightly, but the goal is to achieve automatic weekday releases through improved continuous integration (CI).
Unpacking Troubles and Solutions: Several users encountered an error stating "Error opening archive: Unrecognized archive format" when trying to update. The recommended solution included running modular clean, possibly updating modular, and ensuring zstd is installed.
Musical Emojis Becoming a Hit: Amidst technical discussions, members also appreciated the aesthetics of the purple flame emojis, proposing humorous ideas such as incorporating them into song lyrics.
Surprise at Advanced Features: The announcement of heterogeneous variadic generics in the Mojo build sparked excitement and surprise among users.

Links mentioned:

Comparing 1a8f912..1bce1 · modularml/mojo: The Mojo Programming Language. Contribute to modularml/mojo development by creating an account on GitHub.
Comparing 1a8f912..1bce16d · modularml/mojo: The Mojo Programming Language. Contribute to modularml/mojo development by creating an account on GitHub.
mojo/docs/changelog.md at nightly · modularml/mojo: The Mojo Programming Language. Contribute to modularml/mojo development by creating an account on GitHub.

LAION ▷ #general (26 messages🔥): 

Stability AI's New Model Release: Stability AI released a model named CosXL, which requires acceptance of a non-commercial research community license agreement, asking users to agree to share their contact information.

Updates on Text-to-Image AI: Members discussed how to create images from text that are not present in the Stable Diffusion database. Links were shared, including one to Diffusers' documentation and a correction was made regarding the version update to v0.27.2.

Freelancing Job Market Analysis: A blog post was shared, analyzing freelancing jobs on Upwork to study the impacts of AI on specific job roles.

Debate on Model Training Techniques: Members discussed the merits and drawbacks of different approaches to model training. Opinions varied on the use of EDM schedules and offset noise in these processes.

Indecision on Deepfloyd Stage 3's Future: There were queries about whether Deepfloyd Stage 3 will be released as promised or not, emphasizing the need for clear communication on project pages regarding its status.

Links mentioned:

stabilityai/cosxl · Hugging Face: no description found
The jobs being replaced by AI - an analysis of 5M freelancing jobs - bloomberry: There’s no question that AI will impact jobs. But which jobs are more likely to be replaced by…

LAION ▷ #research (21 messages🔥): 

Twists in Generative Models: A note was made on the irony of autoregression for image models and diffusion for text models, highlighting a reversal of previous trends.
Vogue Cycles in Model Approaches: A member clarified that autoregressive image models have always been around and are favored for their scalability and intellibility, able to predict both text and images.
Potential Strengths of Autoregressive Models: Autoregressive approaches to image generation may also pave the way for text-to-video models, given proper video tokenization, aligning with insights from the CM3leon paper.
Griffin Triumphs over Transformers: Griffin architecture by Google is reported to outperform transformers, boasting an additional 1 billion parameters and better throughput on long contexts according to discussions on Reddit.
Reevaluating Zero-Shot Generalization: The limitations of zero-shot generalization in multimodal models like CLIP are highlighted where data quality and quantity become critical, as articulated in a recent paper.

Links mentioned:

No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance: Web-crawled pretraining datasets underlie the impressive "zero-shot" evaluation performance of multimodal models, such as CLIP for classification/retrieval and Stable-Diffusion for image gener...
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences: This paper studies post-training large language models (LLMs) using preference feedback from a powerful oracle to help a model iteratively improve over itself. The typical approach for post-training L...
Language Modeling by Estimating the Ratios of the Data Distribution | Aaron Lou: no description found
Rick And Morty That Just Sounds Like Slavery With Extra Steps GIF - Rick And Morty That Just Sounds Like Slavery With Extra Steps Slave - Discover & Share GIFs: Click to view the GIF
Reddit - Dive into anything: no description found

Don't miss what's next. Subscribe to AI News (MOVED TO news.smol.ai!):