[AINews] LLaDA: Large Language Diffusion Models
This is AI News! an MVP of a service that goes thru all AI discords/Twitters/reddits and summarizes what people are talking about, so that you can keep up without the fatigue. Signing up here opts you in to the real thing when we launch it 🔜
Chinese AI is all you need?
AI News for 2/14/2025-2/17/2025. We checked 7 subreddits, 433 Twitters and 29 Discords (211 channels, and 11039 messages) for you. Estimated reading time saved (at 200wpm): 1163 minutes. You can now tag @smol_ai for AINews discussions!
There was something today for everyone.
Ahead of the expected Grok 3 release late tonight on a US holiday, today was a notable day of small things. Either:
- the big model releases of the day were the two StepFun models, Step-Video-T2V (30B Text to Video with remarkable coherence, including the famously hard ballerinas that Sora gets wrong) and Step-Audio-Chat, a 132B voice to voice model
- or you could look at Scale AI's EnigmaEval (very hard multimodal puzzles) or Cambridge's ZeroBench (100 manually-curated multi-step visual reasoning questions) evals where current frontier models score 0.
- or you could look at Schulman and Zoph's latest talk on post-training.
But we choose to award today's title story to LLaDA: Large Language Diffusion Models, the first text diffusion model scaled up to be competitive with autoregressive models like Llama 3 8B.
This is a "white whale" alternative LLM architecture that has only been speculated about but never successfully scaled up til now. The main trick is adapting diffusion to predict uniformly masked tokens, producing text in a diffusion process:
For those used to text streaming in from left to right, diffusion doesn't seem very practical... until you try to infill between preset text structures and word choices that you desire, or create coherent long form story structures. LLaDA could be the start of something much bigger here.
Table of Contents
- AI Twitter Recap
- AI Reddit Recap
- AI Discord Recap
- PART 1: High level Discord summaries
- Unsloth AI (Daniel Han) Discord
- Perplexity AI Discord
- HuggingFace Discord
- Codeium (Windsurf) Discord
- LM Studio Discord
- aider (Paul Gauthier) Discord
- OpenAI Discord
- Stability.ai (Stable Diffusion) Discord
- Cursor IDE Discord
- OpenRouter (Alex Atallah) Discord
- Interconnects (Nathan Lambert) Discord
- Eleuther Discord
- Nous Research AI Discord
- Yannick Kilcher Discord
- Cohere Discord
- Latent Space Discord
- MCP (Glama) Discord
- Modular (Mojo 🔥) Discord
- Torchtune Discord
- tinygrad (George Hotz) Discord
- LlamaIndex Discord
- Nomic.ai (GPT4All) Discord
- LLM Agents (Berkeley MOOC) Discord
- DSPy Discord
- MLOps @Chipro Discord
- PART 2: Detailed by-Channel summaries and links
- Unsloth AI (Daniel Han) ▷ #general (1141 messages🔥🔥🔥):
- Unsloth AI (Daniel Han) ▷ #off-topic (114 messages🔥🔥):
- Unsloth AI (Daniel Han) ▷ #help (305 messages🔥🔥):
- Unsloth AI (Daniel Han) ▷ #showcase (102 messages🔥🔥):
- Unsloth AI (Daniel Han) ▷ #research (109 messages🔥🔥):
- Perplexity AI ▷ #announcements (1 messages):
- Perplexity AI ▷ #general (756 messages🔥🔥🔥):
- Perplexity AI ▷ #sharing (47 messages🔥):
- Perplexity AI ▷ #pplx-api (18 messages🔥):
- HuggingFace ▷ #announcements (1 messages):
- HuggingFace ▷ #general (327 messages🔥🔥):
- HuggingFace ▷ #today-im-learning (4 messages):
- HuggingFace ▷ #cool-finds (1 messages):
- HuggingFace ▷ #i-made-this (15 messages🔥):
- HuggingFace ▷ #reading-group (1 messages):
- HuggingFace ▷ #computer-vision (13 messages🔥):
- HuggingFace ▷ #NLP (5 messages):
- HuggingFace ▷ #smol-course (18 messages🔥):
- HuggingFace ▷ #agents-course (772 messages🔥🔥🔥):
- HuggingFace ▷ #open-r1 (1 messages):
- Codeium (Windsurf) ▷ #announcements (1 messages):
- Codeium (Windsurf) ▷ #discussion (50 messages🔥):
- Codeium (Windsurf) ▷ #windsurf (580 messages🔥🔥🔥):
- LM Studio ▷ #general (290 messages🔥🔥):
- LM Studio ▷ #hardware-discussion (545 messages🔥🔥🔥):
- aider (Paul Gauthier) ▷ #general (635 messages🔥🔥🔥):
- aider (Paul Gauthier) ▷ #questions-and-tips (53 messages🔥):
- aider (Paul Gauthier) ▷ #links (1 messages):
- OpenAI ▷ #ai-discussions (323 messages🔥🔥):
- OpenAI ▷ #gpt-4-discussions (22 messages🔥):
- OpenAI ▷ #prompt-engineering (51 messages🔥):
- OpenAI ▷ #api-discussions (51 messages🔥):
- Stability.ai (Stable Diffusion) ▷ #general-chat (369 messages🔥🔥):
- Cursor IDE ▷ #general (344 messages🔥🔥):
- OpenRouter (Alex Atallah) ▷ #announcements (2 messages):
- OpenRouter (Alex Atallah) ▷ #app-showcase (5 messages):
- OpenRouter (Alex Atallah) ▷ #general (315 messages🔥🔥):
- Interconnects (Nathan Lambert) ▷ #news (152 messages🔥🔥):
- Interconnects (Nathan Lambert) ▷ #ml-questions (2 messages):
- Interconnects (Nathan Lambert) ▷ #ml-drama (11 messages🔥):
- Interconnects (Nathan Lambert) ▷ #random (58 messages🔥🔥):
- Interconnects (Nathan Lambert) ▷ #memes (21 messages🔥):
- Interconnects (Nathan Lambert) ▷ #rl (17 messages🔥):
- Interconnects (Nathan Lambert) ▷ #reads (13 messages🔥):
- Interconnects (Nathan Lambert) ▷ #policy (1 messages):
- Interconnects (Nathan Lambert) ▷ #expensive-queries (6 messages):
- Eleuther ▷ #general (20 messages🔥):
- Eleuther ▷ #research (235 messages🔥🔥):
- Eleuther ▷ #interpretability-general (3 messages):
- Eleuther ▷ #gpt-neox-dev (1 messages):
- Nous Research AI ▷ #general (174 messages🔥🔥):
- Nous Research AI ▷ #ask-about-llms (23 messages🔥):
- Nous Research AI ▷ #research-papers (2 messages):
- Nous Research AI ▷ #interesting-links (1 messages):
- Nous Research AI ▷ #research-papers (2 messages):
- Yannick Kilcher ▷ #general (170 messages🔥🔥):
- Yannick Kilcher ▷ #paper-discussion (12 messages🔥):
- Yannick Kilcher ▷ #ml-news (9 messages🔥):
- Cohere ▷ #discussions (57 messages🔥🔥):
- Cohere ▷ #rules (1 messages):
- Cohere ▷ #api-discussions (7 messages):
- Cohere ▷ #cmd-r-bot (96 messages🔥🔥):
- Latent Space ▷ #ai-general-chat (75 messages🔥🔥):
- Latent Space ▷ #ai-in-action-club (68 messages🔥🔥):
- MCP (Glama) ▷ #general (105 messages🔥🔥):
- MCP (Glama) ▷ #showcase (10 messages🔥):
- Modular (Mojo 🔥) ▷ #general (20 messages🔥):
- Modular (Mojo 🔥) ▷ #mojo (76 messages🔥🔥):
- Torchtune ▷ #general (4 messages):
- Torchtune ▷ #dev (44 messages🔥):
- Torchtune ▷ #papers (2 messages):
- tinygrad (George Hotz) ▷ #general (38 messages🔥):
- tinygrad (George Hotz) ▷ #learn-tinygrad (2 messages):
- LlamaIndex ▷ #blog (5 messages):
- LlamaIndex ▷ #general (25 messages🔥):
- LlamaIndex ▷ #ai-discussion (4 messages):
- Nomic.ai (GPT4All) ▷ #general (28 messages🔥):
- LLM Agents (Berkeley MOOC) ▷ #mooc-announcements (2 messages):
- LLM Agents (Berkeley MOOC) ▷ #mooc-questions (5 messages):
- LLM Agents (Berkeley MOOC) ▷ #mooc-lecture-discussion (2 messages):
- LLM Agents (Berkeley MOOC) ▷ #mooc-readings-discussion (2 messages):
- DSPy ▷ #general (9 messages🔥):
- MLOps @Chipro ▷ #events (2 messages):
AI Twitter Recap
AI Model & Research Releases
- LLaDA (Large Language Diffusion Model) 8B: @arankomatsuzaki announced the release of LLaDA, an 8B parameter diffusion language model trained from scratch, which rivals LLaMA3 8B in performance but on 7x fewer tokens (2T tokens). @_akhaliq highlighted that LLaDA uses a diffusion model approach for text generation, differing from traditional left-to-right methods. @omarsar0 further detailed LLaDA's characteristics, noting its competitive performance, scalability, breakthrough of the reversal curse, and capability for multi-turn dialogue and instruction-following. @iScienceLuvr emphasized LLaDA-8B surpassing Llama-2 7B and performing on par with Llama-3 8B on standard benchmarks, trained on 2.3 trillion tokens and 0.13 million H800 GPU hours. @gallabytes questioned the "diffusion" aspect, noting the absence of SDE, probability flow, or noise, despite acknowledging its impressiveness as a new approach. @maximelabonne expressed surprise at the comeback of diffusion models in language.
- Step-Video-T2V 30B: @arankomatsuzaki shared the open-source release of Step-Video-T2V, a 30B text-to-video model capable of generating up to 204 frames, highlighting its high quality videos with strong motion dynamics and consistent content. @_akhaliq also discussed the Step-Video-T2V Technical Report, detailing the practice, challenges, and future of video foundation models. @reach_vb announced the release of a 30B text to video model from @StepFun_ai, also mentioning a Turbo model for faster inference and that it's MIT licensed.
- Step-Audio 130B: @_akhaliq announced Step-Audio, a 130 billion parameter multimodal LLM for understanding and generating human speech, and @reach_vb asked "chat, is this for real?" about a 132B parameter end to end Speech LM, describing it as "voice in, voice out 🤯", and noting it is APACHE 2.0 LICENSED.
- Mistral Saba: @sophiamyang announced Mistral Saba, MistralAI's first regional language model with 24B parameters, trained on datasets from the Middle East and South Asia, supporting Arabic and Indian-origin languages, especially Tamil and Malayalam. @sophiamyang noted it is available via API
mistral-saba-2502
on la Plateforme and custom training is offered.
Benchmarks & Performance
- LLaDA Performance: @iScienceLuvr noted LLaDA 8B surpasses Llama-2 7B and performs on par with Llama-3 8B on standard zero/few-shot learning tasks. @arankomatsuzaki highlighted LLaDA's performance rivaling LLaMA3 8B despite less training data.
- ZSEval Benchmark: @lateinteraction introduced ZSEval, a new LLM benchmark using multiplayer games to pit models against each other on knowledge, reasoning, and planning, also testing self-improvement capability using DSPy optimization.
- GPQA Score Improvement: @casper_hansen_ questioned why SFT on a dataset with 99.9% math boosts GPQA score from 45.30% to 62.02%, asking if the base model is weak in math or if GPQA Diamond is math-adjacent.
- Arabic Benchmark Performance of Mistral Saba: @sophiamyang mentioned strong performance of Mistral Saba on Arabic benchmarks.
Tools & Libraries
- OmniParser Update: @reach_vb announced Microsoft's silent update to OmniParser, a screen parsing tool, noting it is 60% faster than v1 with sub-second latency on a 4090. @mervenoyann highlighted the improved and faster OmniParser as a groundbreaking screenshot parser for web automation, open-source and pluggable with models like Qwen2.5VL, DeepSeek R1, 4o/o1/o3 mini.
- Ollama Model Downloads: @ollama confirmed seeing downloads of models from Ollama going to major cloud hosts and @ollama confirmed Ollama is written in Go.
- LangGraph.js with MongoDB: @LangChainAI announced a webinar on using MongoDB's data platform with LangGraph.js to build JavaScript AI agents, covering Node.js integration and state persistence.
- Arch AI-Native Proxy: @_akhaliq introduced Arch, an AI-native proxy for agents, built by Envoy contributors, with features like edge guardrails, task routing & function calling, and observability.
China & DeepSeek Focus
- DeepSeek's Rise: @teortaxesTex noted Wenfeng from DeepSeek boosting the Chinese stock market by ~$1.3 trillion, suggesting DeepSeek narrative did what no stimulus could. @teortaxesTex described a CCTV report on Hangzhou tech including DeepSeek, Unitree, DeepRobotics, and Game Science, highlighting the contrast between advanced tech and nostalgic transmission. @teortaxesTex listed DeepSeek along with Tencent, Xiaomi, Huawei, Alibaba as crucial technology leaders represented at a Beijing symposium, with Wenfeng as the sole AI wing representative.
- Wenfeng's Influence: @teortaxesTex stated Wenfeng will inspire generations of Asian autists and @teortaxesTex suggested Wenfeng's boyish looks are appropriate for an anime/manga adaptation, better than Dr. Stone. @teortaxesTex confirmed Liang Wenfeng met Xi Jinping, noting his consistent suit and pose in meetings with Party elites.
- Chinese Tech Symposium: @teortaxesTex analyzed the Beijing symposium as deeply symbolic, noting inclusion of Wenfeng and Xingxing, absence of Baidu and ByteDance, reconciliation with Jack Ma, and the presidency of Wang Huning. @teortaxesTex mentioned Wang Huning, author of Xi Jinping Thought, presiding over the tech symposium.
Perplexity Deep Research & Usage
- Deep Research Launch & Usage: @AravSrinivas stated Deep Research usage on Perplexity is 🚀, accounting for a significant fraction of search qps. @AravSrinivas plugged Perplexity as offering practically unlimited deep research agents. @AravSrinivas demoed Deep Research as a small business incorporation legal consultant, highlighting its accessibility compared to expensive human consultants. @AravSrinivas showcased Perplexity Deep Research as a Product Manager for roadmap planning, envisioning AI employees/interns in companies. @AravSrinivas presented Perplexity Deep Research writing an investment memo like Bill Ackman, using $UBER as an example.
- Deep Research Issues & Improvements: @AravSrinivas acknowledged inaccuracy in financial queries on Perplexity Deep Research, citing examples like old bitcoin prices and company market caps, promising fixes and more trustworthy data sources. @AravSrinivas invited feedback on Deep Research, asking for demo requests and pain points, mentioning potential support for longer reports and academic sources. @hrishioa shared tips for using Deep Research, including chatting before searching for context. @hrishioa provided examples of Deep Research with and without prompt crafting, and using o1pro instead of o3minihigh.
AI & Society, Ethics, and Future
- AI Robotics Breakthrough: @adcock_brett declared this is the breakthrough year for AI Robotics we have been waiting for, stating 10 years ago was too early. @adcock_brett expressed a dream for humanoids to build and fly eVTOLs. @Meta announced Meta's AI can now read minds with 80% accuracy using non-invasive brain activity recordings, highlighting its potential for people with brain lesions and the synergy between neuroscience and AI.
- ASI and Existential Risk: @teortaxesTex asserted "my personal ASI will be giving me as many monsters (AND of the flavor I want) as I decide. anything else is extinction event". @sama mentioned "trying GPT-4.5 has been much more of a 'feel the AGI' moment" among high-taste testers. @teortaxesTex quipped about an "ASI Hail Mary" saving a crashing empire.
- AI in Law & Services: @finbarrtimbers suggested if Harvey (AI law firm) is the future of law, the natural conclusion is to launch their own law firm and dominate the vertical. @finbarrtimbers expressed bearishness on service companies like law firms increasing margins like software companies. @andersonbcdefg stated it is "impossible* to make a law firm that is also a tech company that rewards non-lawyers with equity" due to ownership rules.
- Bias and Viewpoints in LLMs: @fabianstelzer argued LLMs are media publishers and will have viewpoints, seeing bias as a feature, not a bug, and predicting emergence of "FOX news & a NYT model".
- AI Safety & Ethics: @mmitchell_ai noted interest in their paper advocating "Fully Autonomous AI Agents Should Not Be Developed". @teortaxesTex identified "Safetyism" as one of three Great Filter events on the road to mature humanity.
- Future of AI Development: @omarsar0 predicted we will stop caring about the specific models powering AI systems, with product or dev experience becoming the winning factor. @ClementDelangue called for a return to a more open, transparent & collaborative AI, reminiscent of 2016-2020.
Humor/Memes
- Grok 3 Hype: @aidan_mclau made a dark humor joke: "a man died to tell us how good grok 3 really is never forget", reaching over a million impressions. @marktenenholtz encouraged hype for Grok 3 release, anticipating what a "cracked team can ship with a significantly bigger cluster than GPT-4". @teortaxesTex commented "small if true with 60k GPUs I can see Grok 3 actually failing to secure primacy even for a day".
- ChatGPT-4o Coding Prowess: @_akhaliq shared impressive code generated by ChatGPT-4o for a bouncing balls simulation, and @_akhaliq showcased another ChatGPT-4o code for a rotating ASCII sphere, contrasting it with an original test by @flavioAd. @mckbrando simply stated "i have no idea what we did but the new chatgpt is like cool now".
AI Reddit Recap
/r/LocalLlama Recap
Theme 1. Zonos: Open Weight Voice Cloning Model
- Zonos, the easy to use, 1.6B, open weight, text-to-speech model that creates new speech or clones voices from 10 second clips (Score: 330, Comments: 80): Zonos is a 1.6B parameter open-weight text-to-speech model that can generate new speech or clone voices from clips as short as 10 seconds. It runs efficiently on 8GB VRAM and is easy to set up on Linux using Docker, with a Windows-friendly fork available, although not vouched for by the author. The model and its hybrid version are available on Hugging Face, and the author recommends using Ocenaudio for editing voice samples.
- Users discuss the technical setup of Zonos, including using Docker on both Linux and Windows, and the necessity of WSL2 for Windows users to enable Docker with the Nvidia Container Toolkit. Some users report issues with docker-compose.yml and needing to adjust network_mode and ports for proper operation.
- There is debate over the performance and quality of Zonos compared to ElevenLabs, with some users finding Zonos less natural and expressive, while others appreciate its cost-effectiveness and potential for voice cloning. Users note that espeak is used for phonemization, which may affect performance in non-English languages.
- Discussions highlight the limitations and potential improvements of Zonos, such as the 30-second output cap and the need for better emphasis and emotion control in generated speech. The community is anticipating future updates and improvements, with some users expressing interest in using Zonos for applications like sillytavern and comparing it to other models like Kokoro and Fairseq.
Theme 2. OpenArc Python API Enhances Intel Inference
- Today I am launching OpenArc, a python serving API for faster inference on Intel CPUs, GPUs and NPUs. Low level, minimal dependencies and comes with the first GUI tools for model conversion. (Score: 279, Comments: 44): OpenArc is a newly launched lightweight inference engine designed to leverage Intel hardware acceleration using Optimum-Intel from Transformers. It features a strongly typed API with four endpoints, native chat templates, and a pydantic model for maintaining exposed parameters. Aimed at owners of Intel accelerators and edge devices, OpenArc offers a low-level approach to context management and promotes API call design patterns for interfacing with LLMs locally. Future updates include an OpenAI proxy, docker compose examples, and support for multi-GPU execution.
- Vendor Lock-In Concerns: Users expressed concerns about vendor lock-in, emphasizing the importance of being able to run models on diverse hardware such as Apple, Intel, Nvidia, and AMD. The ability to operate LLMs without GPU dependency would facilitate local execution, breaking the Nvidia dominance in the market.
- Performance and Compatibility: There is interest in how OpenVINO performs compared to llama.cpp, with discussions highlighting that OpenVINO uses a unique graph representation for quantization, which is not directly comparable to llama.cpp. Anecdotal evidence suggests OpenVINO is faster at higher precisions, and efforts are underway to benchmark its performance with newer models.
- Community and Collaboration: Several users praised the project and expressed interest in future updates, particularly the Docker compose examples for testing on low-cost Intel VPS. There is also a call for collaboration, especially from those working on similar projects targeting Intel platforms, with one user specifically mentioning work on multi-system model hosting and distributed fine-tuning.
Theme 3. DeepSeek-R1: MoE Model CPU Performances
- DeepSeek-R1 CPU-only performances (671B , Unsloth 2.51bit, UD-Q2_K_XL) (Score: 120, Comments: 59): DeepSeek-R1, a 671B model, shows promising CPU-only inference performance due to its MoE nature, with tests conducted on various CPUs like Xeon w5-3435X and TR pro 5955wx. Using Unsloth's 2.51-bit quantization yields better quality and speed compared to 1.58-bit, achieving 4.86 tok/s on the Xeon w5-3435X. kTransformer offers a nearly 2x performance gain by using a mix of CPU and GPU, although it limits context length by VRAM. The author is seeking additional CPU performance data and notes that STREAM benchmark results were lower than expected, with updates planned for kTransformer v0.3 and prompt processing rates.
- Performance Comparisons: Users are discussing the performance of various CPUs, including Xeon w5-3435X, Epyc Rome, and Threadripper 5965wx, with a focus on memory bandwidth and the impact of CCDs. Xeon Sapphire Rapids shows lower than expected real bandwidth, while Epyc Rome offers good value with DDR4 memory, achieving ~160GB/s on STREAM.
- Optimization Techniques: Suggestions for optimizing speed include experimenting with thread counts, avoiding KV cache quantization, and using flags like
--no-mmap --mlock
to improve prompt ingestion speeds. CheatCodesOfLife provides detailed performance comparisons for different cache types and quantization settings. - Quantization and Speed: The 2.51-bit quantization method by Unsloth is highlighted for its superior performance over 1.58-bit, and users report varying speeds with different setups. Notably, thereisonlythedance mentions achieving 6 tokens per second with a partial offload, while others note the effects of hardware upgrades on performance improvements.
Other AI Subreddit Recap
/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding
Theme 1. Nvidia GPUs: Compute Doubling in 10 Months Raises Questions
- Nvidia compute is doubling every 10 months (Score: 481, Comments: 37): NVIDIA's compute power is projected to double every 10 months, as illustrated by a graph showing exponential growth in FLOP/s from 2020 to 2024 across GPU generations: Hopper, Ampere, Volta, and Pascal. The data indicates a projected growth rate of 2.3x per year, with a confidence interval between 2.2 and 2.5x.
- Users questioned the accuracy of NVIDIA's compute power projections, suggesting that the figures may involve comparing different precision levels (e.g., FP16 to FP8), which can exaggerate perceived performance gains. There was curiosity about the specific types of FLOPS being referenced, with assumptions leaning towards Tensor FLOPS likely in FP16.
- Discussions compared the growth rate to Moore's Law, noting that while Moore's Law pertains to transistor counts, NVIDIA's projections encompass both performance and production scale. Users highlighted that the chart reflects total installed power, not just the power of individual GPUs, which can be misleading if production quantities increase.
- There was skepticism about whether the growth rate would outstrip demand, with comments indicating that demand for compute is growing even faster than supply. Users inquired about the status of Blackwell GPUs and their potential to sustain this growth, reflecting on the need for more compute in areas like video processing.
Theme 2. Advances in Video-to-Video AI: Hunyuan's Harry Potter Anime
- Harry Potter Anime 2024 - Hunyuan Video to Video (Score: 683, Comments: 69): The post mentions the Harry Potter Anime 2024 transformation utilizing the Hunyuan Video-to-Video model. However, no further details or context are provided in the post body.
- Users expressed interest in the workflow and rendering details, with requests for information on GPU usage and the time taken for rendering. Inner-Reflections shared a link to a Lora model and mentioned the potential of controlnets to improve results.
- Concerns were raised about consistency and character design, noting issues like Ron’s hair color and lack of facial expressions. DaddyKiwwi and FourtyMichaelMichael discussed the challenge of maintaining style consistency, possibly due to rendering limitations.
- Neither_Sir5514 and others highlighted a preference for 2D art styles with lower frame rates, contrasting with the less appealing 2.5D Pixar-like styles. Liquidphantom added that anime typically runs at 12fps for action and 8fps for dialogue, emphasizing the suitability of this technology for animation.
Theme 3. Open-Source Video Model Step-Video-T2V: High Demand, High Innovation
- New Open-Source Video Model: Step-Video-T2V (Score: 330, Comments: 63): The post introduces Step-Video-T2V, a new open-source video model. No additional details or context were provided in the post body.
- VRAM Requirements and Optimization: The Step-Video-T2V model requires 80GB of VRAM for optimal performance, but discussions suggest that with quantization and optimizations, it might run on 24GB of VRAM. Integration into the Diffusers library is anticipated to enhance optimization capabilities.
- Technical Details and Resources: The model is described as having 30 billion parameters with a deep compression VAE achieving 16x16 spatial and 8x temporal compression ratios. Resources such as the code and weights are available on GitHub and Hugging Face.
- VRAM Innovations: A linked article discusses SanDisk's new HBF memory that could enable up to 4TB of VRAM on GPUs, which might be beneficial for read-heavy tasks but less so for training due to higher latency compared to HBM. There's speculation about a mixed HBF/HBM architecture for GPUs to balance computation and storage needs.
Theme 4. AI Agent Apply Hero: Mass Job Applications and its Impact
- AI Agent Apply Hero has done over 1.6M Job Applications (Score: 301, Comments: 30): AI Agent Apply Hero has submitted over 1.6 million job applications using Claude to power their models, demonstrating significant advancements in AI capabilities. The Reddit post highlights the rapid progress and potential of AI technologies in the coming years.
- Commenters expressed skepticism about the effectiveness of AI Agent Apply Hero, suggesting it creates unnecessary noise and questioning its success rate in securing jobs. Deep_Area_3790 and DaShibaDoge highlighted concerns about the impact on the job application process and the actual outcomes of the 1.6 million applications.
- Some users, like EngineeringSmooth398 and Halbaras, criticized companies for the cumbersome job application processes, arguing that AI tools might highlight the redundancy and inefficiency of current systems. literum suggested a potential future where AI could manage job applications more efficiently, similar to how companies use AI for resume reviews.
- Funny_Ad_3472 humorously pointed out the disconnect between automated applications and the human element, noting the oddity of being called for interviews for jobs one doesn't remember applying to due to AI automation.
Theme 5. AI Image Restoration: Upscaling the Windows XP Bliss Wallpaper
- Iconic Windows XP Bliss Wallpaper from 800x600 to 8K (7680x4320) - Outpaint and Upscale with Flux Dev (Score: 170, Comments: 22): Iconic Windows XP 'Bliss' Wallpaper has been upscaled from 800x600 resolution to 8K (7680x4320) using AI tools such as Flux Dev. This showcases the application of AI in enhancing classic digital imagery.
- Discussions highlight the original location of the 'Bliss' image and note that many high-quality images of the site already exist, questioning the necessity of AI upscaling. External_Waltz_8927 and others critique the AI-generated 8K image for its lack of detail, suggesting that improper settings, such as excessive denoising, may degrade image quality.
- LatentSpacer shares a detailed workflow for image upscaling using AI, including a link to the process and emphasizes the importance of adjusting settings like denoising and tile size. The final image is shared here, with minor retouching done in Photopea to remove artifacts.
- The conversation also references an archived high-resolution original of the 'Bliss' image available in PNG format at archive.org. There is a discussion on the original photo's camera, the Mamiya RB67, and its large negative format, which offers considerable detail, suggesting its use in AI prompts for better quality outputs.
AI Discord Recap
A summary of Summaries of Summaries by Gemini 2.0 Flash Thinking
Theme 1. Grok 3 Ignites AI Debate: Musk's Claims Meet Community Skepticism
- Musk Hypes Grok 3 as "Smartest AI on Earth": Elon Musk teased the upcoming release of Grok 3, proclaiming it 'the smartest AI on Earth' and scheduling a live demo, creating a buzz but also skepticism among users about its practical utility and potential biases. Concerns arose about Grok 3 being overly tailored for cultural preferences rather than genuine intelligence, as the OpenAI board firmly rejected Musk's $97.4 billion bid to control the company, stating “OpenAI is not for sale.” (Bloomberg Report)
- Users Speculate Grok 3's Capabilities: Enthusiasts are eagerly awaiting Grok 3, with one user optimistically stating, 'if Grok 3 is 10x better, it's game over', while referencing a tweet from Sam Altman teasing improvements to ChatGPT-4o. However, some express concern about Grok potentially becoming a propaganda tool depending on its programming, and anticipate evaluating its performance against existing AI models like O1.
- Grok's Potential Political Leanings Spark Concern: Users voiced apprehension that Grok, backed by Elon Musk, might exhibit biases reflecting his political views, emphasizing the need for balanced censorship in AI and the avoidance of political narratives. Discussions centered on whether AI should influence political opinions, with hopes for Grok 3 to perform well technically, yet worries it could become a propaganda instrument.
Theme 2. DeepSeek Models Drive Performance and Ethical Discussions
- DeepSeek R1 Races to Reasoning Crown: Users are comparing DeepSeek R1 favorably against O3 Mini High, finding R1 superior for reasoning tasks, noting a tweet from Nebius AI Studio touting DeepSeek R1's performance endpoint reaching 60+ tokens/second. While R1 excels in reasoning, O3 Mini High is noted for consistency across scenarios.
- DeepSeek R1's Reasoning Tokens Raise Questions on OpenRouter: Users on OpenRouter observed irregularities with the free DeepSeek R1 model, where reasoning tokens appeared unexpectedly in the response content instead of separate fields. The OpenRouter team is addressing these issues to refine the behavior of open-source reasoning models, referencing DeepSeek's GitHub for usage guidelines.
- DeepSeek Debated as Potential CCP Project: The LM Studio community engaged in light debate about whether DeepSeek is a CCP project, discussing its funding and rapid AI advancements. Speculation arose that the fast pace of AI development might lead to unforeseen breakthroughs, and some users expressed concerns about potential censorship in DeepSeek influenced by training data and web interface settings, as reported by Aider users.
Theme 3. Llama 3.2 Challenges VRAM Limits and Sparks Quantization Solutions
- Llama 3.2 Gobbles VRAM: Hugging Face users are grappling with Llama 3.2 models requiring 20GB VRAM for local use, prompting exploration of quantization techniques like EasyQZ to run larger models on GPUs with limited VRAM. Performance benchmarks are shared, with a 9B model surprisingly outperforming much larger models.
- Unsloth Optimizes Llama 3.2 Fine-tuning: Unsloth AI highlights its ability to fine-tune Llama 3.2 with extended context lengths up to 32k tokens, demonstrating a significant advancement in handling longer sequences. They also fixed bugs in Phi-4, and offer tutorials for fine-tuning Llama 3.2 and Phi-4 (Unsloth Blog, Unsloth Blog Phi-4, YouTube Tutorial).
- Torchtune Community Enhances Llama 3.3 Tokenizer: The Torchtune community reached consensus on adding a 'tool' option to the Llama3 tokenizer to ensure backward compatibility with 'ipython', showcasing collaborative improvements. This update reflects ongoing efforts to refine and adapt models for diverse applications and maintain compatibility.
Theme 4. RAG vs Fine-tuning Debate Intensifies: Efficiency and Application Focus Emerge
- Fine-tuning Dominates RAG for Performance: Unsloth AI Discord members found fine-tuning significantly outperforms RAG when sufficient training data is available, leading to superior performance in LLM applications. One member suggested that RAG may become niche, with fine-tuning becoming the dominant approach for most use cases.
- RAG Faces Repomap Roadblocks in Large Codebases: Aider users discussed the inefficiencies of RAG and Repomap in large repositories, observing diminishing returns as codebase size increases, with suggestions for hybrid scoring using BM25 and RAG to improve search. A blog post about RAG for codebases with 10k repos was referenced, outlining methods to bridge LLMs' context gaps.
- Perplexity Deep Research Challenges Traditional Search: Perplexity AI launched Deep Research, a feature for in-depth research reports via multiple searches and source analysis, offering 5 queries per day for free users and 500 for Pro users. User critique highlights limitations and occasional hallucinations compared to OpenAI offerings, yet hopes remain for future improvements (Perplexity Blog).
Theme 5. Community-Driven Tooling and Optimization Efforts Advance AI Development
- Unsloth's Hiring Challenges Seek Deep Learning Problem Solvers: Unsloth AI launched five challenges offering salaries up to $500K/year for experts in deep learning optimizations, seeking community contributions in areas like Griffin and FSDP2. This initiative aims to accelerate advancements in critical technical domains through open community engagement.
- OpenRouter Charts Real-Time Data and Tooltip Extension Debuts: OpenRouter enhanced its throughput and latency charts with real-time updates via Google AI's Vertex (OpenRouter Announcement), and introduced the Versatile AI Tooltip Chrome extension for quick text processing using OpenRouter models. The extension focuses on efficient summarization and translation (Chrome Web Store).
- Cursor IDE Users Innovate with MCP Server Setups: Cursor IDE users are actively sharing innovative MCP server setups, including Brave Search and Sequential Thinking, detailed in this YouTube video, to enhance coding workflows and documentation. These setups are reported to improve coding process reliability significantly.
PART 1: High level Discord summaries
Unsloth AI (Daniel Han) Discord
- Unsloth Announces Hiring Challenges: Unsloth has announced five challenges offering salaries up to $500K/year for problem-solvers in deep learning optimizations.
- The company seeks community contributions for advancements in Griffin, FSDP2, and other technical areas.
- Users Skeptical about AI Agents: Discussions arose about the definition of AI agents, with some suggesting they are merely well-crafted prompts rather than groundbreaking tech.
- Members expressed concerns about the hype surrounding AI agents, stressing the importance of practicality over marketing language.
- CUDA OOM Errors Plague Users: Members discussed experiencing CUDA out-of-memory (OOM) errors mid-training, potentially due to data imbalance.
- Despite using similar settings, even data variance can lead to spikes in VRAM load, complicating resolution efforts.
- Fine Tuning Beats RAG in Performance: Members found that fine tuning can significantly outperform RAG with sufficient training data, improving overall performance in LLM applications.
- One member suggested that RAG may be delegated to niche uses, fine tuning will continue to predominate.
- SCION Optimizer Proposed: A new optimizer, SCION, was suggested, which could potentially train large neural networks without using Adam, requiring less memory while improving results.
- This sparked discussions about the practical applications of SCION in model training while efficiently managing computational resources.
Perplexity AI Discord
- Perplexity's Deep Research Unveiled: Perplexity AI launched Deep Research, a feature enabling in-depth research reports via multiple searches and source analysis, capped at 5 queries per day for non-subscribers, and 500 for Pro users.
- Deep Research is accessible on the Web and will soon be available on iOS, Android, and Mac, receiving high marks on expert-level tasks like Humanity’s Last Exam, as outlined in the official blog.
- Users Critique Deep Research Limitations: Users are drawing comparisons between Perplexity Pro and You.com, noting that Deep Research is limited and occasionally hallucinates responses, leading some to consider alternatives with more accessible pricing.
- Despite concerns and unfavorable comparisons to OpenAI's offerings as expressed on the subreddit, members remain hopeful about future improvements to Perplexity’s features.
- Sonar API's Missing Citations Exposed: Members report that the Sonar API fails to return citations as expected, despite references appearing in the output, sparking confusion about its functionality.
- A user clarified that citations are separate and require correct access, as explained in the Perplexity documentation.
- DeepSeek's Impact Sparks Debate: Members are exploring the potential impact of DeepSeek on AI methodologies, recognizing its ability to reshape current practices, as described on the Perplexity AI page.
- While some emphasize benefits in AI depth, others are wary of integration difficulties.
- Ethereum's Spectra Upgrade Highlighted: The community shared and discussed the recent Ethereum Spectra upgrade, noting its key changes and enhancements to the network's architecture.
- Conversations centered around the potential effects on decentralized applications (dApps) and the broader Ethereum ecosystem, which were summarized on Perplexity AI's page.
HuggingFace Discord
- HF Users Grapple With Llama 3.2's VRAM Demand: Hugging Face users are discussing local model usage, noting that the Llama 3.2 model requires 20GB VRAM, with exploration into quantization options like EasyQZ to fit larger models on GPUs with lower VRAM.
- Performance metrics for various models are being exchanged, showcasing the impressive capabilities of a 9B model that's outperforming models much larger in size.
- Science News Bot Joins the Fray: The Science News Bot launched on GitHub, ready to deliver the latest in scientific advancements and research.
- The team behind Mini PyTorch has released a lightweight alternative for PyTorch enthusiasts and developers, available on GitHub.
- MultiLingual NER Model Shows Promise: The GLiNER-MoE-MultiLingual model, a zero-shot Named Entity Recognition (NER) model, showcases advanced NER capabilities, trained for one epoch using the NOMIC-MOE architecture and is accessible here.
- Optimized SwarmFormer now features local windowed attention and token-to-cluster gating, enhancing computational efficiency for mobile and edge environments.
- Differential Privacy Gains Traction!: A member launched a blog series dedicated to Differential Privacy, starting with an introductory post titled Differential Privacy!! But Why? available here.
- The series emphasizes the critical balance between extracting insights from data and protecting individual privacy in various contexts.
- HF Course Participants Await New Units: Participants in the Hugging Face Agents Course celebrated completing Unit 1 and receiving certificates, while new learners sought guidance on accessing course materials via the official Hugging Face page.
- Community members are proposing the formation of study groups, and bonus materials plus subsequent units are expected to be released periodically.
Codeium (Windsurf) Discord
- Windsurf Surfs Up with Patch 1.3.4: The Windsurf 1.3.4 patch is out, squashing bugs in Cascade write tools and fixing message cancellation issues. Details can be found in the changelog.
- Key fixes include better Cascade Base credit handling and clearer auth login error messages.
- Model Context Protocol Gets Cascade Support: Cascade now supports the Model Context Protocol (MCP), enabling tool calls to configured MCP servers which should get setup using the hammer icon.
- This feature is available to all individual plans, with each MCP tool call costing one flow act.
- JetBrains Extension Users Holding Breath: Users want updates to the JetBrains extension, especially to refresh the model list for tools like DeepSeek and Gemini 2.0 Flash.
- A dev responded by saying updates are incoming, highlighting ongoing maintenance for Codeium extensions due to enterprise user needs.
- Windsurf Waves Frustration on Codeium Extension: Users are salty that Windsurf has overshadowed the original Codeium extension, lamenting the lack of feature parity.
- Some users are so frustrated they are looking at alternatives like Qodo.
- Cascade's Auto-Save: Accidental Savior or Sinister Saboteur?: Reports show Cascade auto-applies changes without asking, creating confusion about code edits.
- The suggested workaround is starting new conversation histories to dodge auto-suggestions.
LM Studio Discord
- Qwen MLX Model Causes Headaches: Users reported a
ModuleNotFoundError
when loading the Qwen MLX model on Macbook Ultra, suggesting updates tomlx-engine/0.6.0
might help.- Despite attempts to fix the problem, including removing specific directories, the issue persisted even with the latest version.
- RX580 Performance Falters: The RX580 GPU faces limitations when running LLMs, with recommendations leaning towards newer models boasting ROCm support for improved performance.
- Users are encouraged to use cloud services such as vast.ai to evaluate the performance of GPU models without significant upfront investments.
- DeepSeek labeled as CCP project: The community lightly debated whether DeepSeek is a CCP project, and its funding amid rapid advancements in AI tech.
- The group speculated that the pace of AI advancements may lead to unexpected breakthroughs.
- API Key Woes Plague Local Server Users: Users expressed frustration with the local server API requiring an OpenAI API key in LM Studio.
- Chat participants suggested code modifications to efficiently bypass API key requirements for local setups.
- 4060 Ti Crushes It: The RTX 4060 Ti stands out for its efficient handling of AI models, particularly regarding tokens per second.
- Comparisons reveal that the RTX 3090 also performs admirably, especially for running larger models.
aider (Paul Gauthier) Discord
- Grok 3 Hype Builds: Enthusiasts anticipate the upcoming release of Grok 3, with speculation about its capabilities and performance improvements; there are reports of early access to Grok 3, including a voice mode called Ara.
- One user speculated about the potential improvements, stating, 'if Grok 3 is 10x better, it's game over', while referencing a tweet from Sam Altman teasing improvements to ChatGPT-4o.
- R1 Battles O3 Mini High for Reasoning Crown: Users are weighing DeepSeek R1 and O3 Mini High, finding R1 preferable for reasoning capabilities; the discussion mentioned a tweet from Nebius AI Studio touting DeepSeek R1's performance endpoint reaching 60+ tokens/second.
- Feedback indicates that while R1 excels in certain reasoning tasks, O3 Mini High offers more consistency across various scenarios.
- RAG Faces Repomap Roadblocks: Participants are grappling with inefficiencies of RAG and Repomap in expansive repositories, observing diminishing returns as the codebase swells; one suggestion involved hybrid scoring using BM25 and RAG to boost search results.
- Discussion touched on a blog post about RAG for codebases with 10k repos, outlining methods to bridge LLMs' context gaps.
- Gemini Flash 2.0 Surprises with Speed: Anecdotal evidence indicates that Gemini Flash 2.0 delivers faster, cost-effective solutions with surprisingly robust reasoning and math skills compared to Sonnet.
- This revelation ignited comparisons and discussions among users evaluating different AI models for performance and efficiency.
- Aider Hit with Pesky Bugs: Following recent updates, Aider users are reporting bugs, specifically command parsing snags and
.dockerfile
handling hiccups.- The community also voiced concerns about potential censorship in DeepSeek, influenced by training data and web interface settings.
OpenAI Discord
- Grok's Potential Political Slant Sparks Debate: Users voiced concerns that Grok, backed by Elon Musk, might exhibit biases reflecting his political views and actions on Twitter, emphasizing the necessity of reasonable censorship in AI and the avoidance of political narratives.
- The discussions touched on whether AI should influence political opinions, with some hoping Grok 3 would perform well against existing AI while others worried it could become a propaganda tool depending on its programming.
- AI-Generated Code Raises Security Alarms: Users highlighted the risks of relying on AI-generated code due to potential security vulnerabilities like unsafe string concatenation, stressing the importance of critical code review by developers.
- The discussion emphasized that poor coding practices from AI could introduce vulnerabilities into applications if not thoroughly vetted by human experts.
- GPT Store Action Requires Privacy Policy: A member reported encountering an error stating 'Public actions require valid privacy policy URLs' when attempting to publish in the GPT Store.
- Another member suggested that filling in the privacy policy field in actions could resolve this issue.
- Multi-Agent Systems Circumvent Token Limits: Members discussed the limitations of token output streams in AI interactions, sharing frustration and experiences about how it affects their projects.
- It was suggested that using multi-agent setups could help sidestep token limitations when generating longer outputs, while noting the potential of MCP (Model Context Protocol) to streamline API interactions.
Stability.ai (Stable Diffusion) Discord
- SD 1.7.0 Outshines 1.10: Users reported that Stable Diffusion 1.7.0 is more efficient than 1.10, featuring faster loading times and improved prompt adherence.
- In contrast, users found version 1.10 would often generate unsatisfactory images, citing longer load times and erratic results.
- Optimal LORA Training Disclosed: For better LORA training, aim for approximately 50-150 images of your subject, as discussed among users.
- The community also recommends tools like Koyha LORA to streamline the training process for specific styles.
- Local AI > Online AI?: Users advocated for running AI models locally, highlighting greater control and customization compared to online services.
- They advised that an NVIDIA GPU with at least 8GB VRAM and 32GB of RAM is necessary for optimal performance.
- Mastering Image Generation: Discussions covered effective image creation methods, including employing regional prompting and optimizing word order in prompts for better outcomes.
- Clarity in prompt structuring was emphasized as a key factor in achieving superior image results.
- ComfyUI: Power with Complexity: While acknowledging that ComfyUI can be intimidating for new users, the community recognizes its flexibility and extensive capabilities in image generation workflows.
- Convenient tools like drag-and-drop image uploading simplify AI generation tasks.
Cursor IDE Discord
- Claude's Performance Nosedives: Users reported that Claude's performance has slowed down significantly and its context awareness has decreased over the past few days, leading some to believe it has been 'nerfed'. One user suggested to try Anthropic's new Thinking model and turn on Thinking in model settings to see Claude's thought process, described in this tweet.
- Members expressed frustration and began exploring alternative models and noted it may be experiencing internal changes without a formal model update, according to this reddit thread.
- MCP Server Setups Spark Innovation: Users shared various MCP server setups, highlighting the use of Brave Search and Sequential Thinking to improve performance, detailed in this YouTube video. These setups reportedly enhance the reliability and documentation of the coding process.
- A user also sought a basic 'hello world' example for MCP, indicating a demand for more accessible resources, and another shared a link about Tavily MCP.
- Grok 3's Release Generates Curiosity: The launch of Grok 3 has sparked interest, with users curious about its performance versus existing models and there is speculation about backend updates that might align it with newer Llama versions.
- Users expressed eagerness to test Grok 3 to validate its abilities after previous disappointing launches. News of the roll out discussed in this tweet.
- Cursor's Task Handling Faces Scrutiny: Concerns have been raised about Cursor's ability to manage extensive tasks and process larger files efficiently, especially in agent mode, even for users with unlimited tokens. Users may need to consider alternatives due to these limitations. You can view models via this link.
- One user shared PearAI, an open source AI-powered code editor as one possible alternative. Another user shared a tweet testing multiple AI tools to convert a simple Figma UI design into code.
- Free Trial Exploits Alarm Cursor Users: Users are concerned about potential exploits that allow abuse of Cursor's free trial, leading to conversations about security and oversight. There is a tutorial and script to use Cloudflare to renew Cursor indefinitely as shown in this tweet.
- The community is frustrated by these issues, urging the Cursor team to be vigilant against misuse and one user shared a GitHub repo for auto signing into Cursor.
OpenRouter (Alex Atallah) Discord
- OpenRouter's Real-Time Charts Get a Jolt: OpenRouter's throughput and latency charts now update in real time, thanks to Google AI's Vertex enhancements, as detailed in their announcement.
- They also released llms.txt, a comprehensive resource to chat with the docs (llms.txt and full version).
- AI Tooltip Extension is Born: The Versatile AI Tooltip Chrome extension lets users quickly process text using OpenRouter models, requiring just an API key for setup.
- It focuses on summarizing articles and translating text snippets affordably, with details available on the Chrome Web Store.
- Toledo1 Offers AI on Demand: The Toledo1 platform provides private AI assistant chats on a pay-per-question basis, combining multiple AIs for accuracy.
- It features client-side search with easy installation, promising enterprise-grade security without subscription fees via Toledo1.
- DeepSeek R1's Reasoning Raises Eyebrows: Users noticed inconsistencies with the DeepSeek R1 free model where reasoning tokens appeared unexpectedly in the response content rather than as separate fields.
- The OpenRouter team is tracking solutions to manage the behavior of open-source reasoning models, with usage recommendations found on DeepSeek's GitHub.
- Data Privacy Becomes a Hot Potato: Concerns were raised about forced routing to specific countries potentially violating data protection laws, advocating for region-based routing options for compliance.
- OpenRouter acknowledged the need and is exploring options to support EU-specific routing for better legal compliance, especially with current discussions around OpenAI SDK.
Interconnects (Nathan Lambert) Discord
- Musk's Grok 3 Sparks Debate: Elon Musk announced the release of Grok 3 with a live demo scheduled, touting it as the 'smartest AI on Earth', spurring mixed reactions regarding its alignment with practical user needs and 'based'-ness. The live demo is scheduled for Monday night at 8 PM PT.
- Some concerns were raised about the model being overly tweaked for cultural biases, and the OpenAI board rejected Musk's $97.4 billion bid to control the company, stating, “OpenAI is not for sale.” The board emphasized that they unanimously turned down Musk’s attempts to disrupt competition, as reported by Bloomberg.
- Mistral Saba aims for Regional AI: Mistral Saba, a 24B parameter model, is designed for the Middle East and South Asia, emphasizing custom-trained capabilities to improve language representation, nuance, and cultural context. Mistral AI introduced it as One of the many custom-trained models to serve specific geographies, markets, and customers.
- This move highlights a trend towards tailoring AI models for specific regional markets instead of depending on general-purpose models, showcasing how regional nuance is becoming more important for specialized applications.
- Tencent's Hunyuan LLM Turbo-S to Debut: Tencent is planning to publicly release their Hunyuan LLM Turbo-S and a video generation model, HunyuanVideo I2V, expected in Q1 2025, as tweeted by 青龍聖者.
- This announcement reflects Tencent's ambition to strengthen its position in the competitive AI field, particularly in video technologies, potentially reshaping AI applications in media and entertainment.
- GRPO pairs with SFT!: Members noted that GRPO isn’t meant to replace SFT; rather, both should be utilized together for optimal results. Trelis Research tweeted a teaser for a video on GRPO vs SFT.
- This collaborative approach to model training is favored among users seeking to maximize model performance by leveraging the strengths of both GRPO and SFT techniques.
- LLaDA Challenges Autoregressive Models: The paper introduces LLaDA, a diffusion model that challenges traditional autoregressive models (ARMs) by showcasing its capabilities in scalability and instruction-following abilities, detailed in Large Language Diffusion Models.
- The introduction of LLaDA prompts discussion and scrutiny within the AI community regarding its classification as a diffusion model, with observations noting the absence of traditional diffusion characteristics.
Eleuther Discord
- Data URIs Spark AI Image Generation: A member explored training AI to generate images using Data URIs or Base64 encoding, allowing images to be stored without external linking, but noted that this method typically shows up as a blob in user interfaces.
- The discussion highlights a trade-off between ease of storage and display quality when using Data URIs for AI-generated images.
- Child Prodigies Inform AI Advancement: A conversation arose regarding the commonality of child prodigies in areas like Math and Chess, suggesting these fields might be promising for AI advancements.
- Another member speculated that music might also fit this mold, questioning why it isn't classified as 'easy'.
- NovelAI Unlocks Sampler Secrets: NovelAI's understanding of sampling methods has evolved due to their decision to provide a wide variety of sampler algorithms, pushing developers to figure out the intricacies involved.
- This contrasts with other organizations that initially did not offer complex samplers and thus lacked a reason to invest significant efforts into understanding them.
- Repetition Penalty Remains Tricky: The repetition penalty in text generation is a statistical method to mitigate common issues encountered with less skilled authors, who tend to overuse repetition in their writing.
- While skilled authors can successfully utilize repetition for dramatic effect, distinguishing good repetition from bad remains a challenge in developing sampler policies.
- DEQs Depth Defies RNN Time: Discussion around DEQs highlights their relationship with recurrent structures, focusing on the implications of weight tying and the potential for hidden layers to converge to fixed points.
- Participants noted that, unlike RNNs which handle time variation, DEQs emphasize depth and can utilize implicit differentiation methods for backpropagation.
Nous Research AI Discord
- DeepHermes Delivers on Reasoning: Users are reporting that the DeepHermes model follows instructions better than current R1 distillates, and has been reported as the first usable general-purpose reasoning model. One DeepHermes-3 user reported a processing speed of 28.98 tokens per second on a MacBook Pro M4 Max consumer hardware, according to VentureBeat.
- It was suggested that reasoning tasks and datasets could further enhance this performance.
- Call to Open Source Older Models: Concerns were voiced over Anthropic's decision to potentially delete Claude 3 without plans to open-source it, despite its age. It was argued that releasing older models could foster goodwill and benefit the company's reputation in the open-source community.
- Members pointed to previous work from Clement Delangue, CEO of Hugging Face, who discusses Open AI, DeepSeek and innovation at his company in this Youtube video.
- Reinforcement Learning (RL) Boosts LLMs: Insights were shared about how Reinforcement Learning (RL) and rewards effectively enhance LLMs, particularly regarding alignment tools. Historical references indicated that researchers were aware of these techniques long before other companies adopted similar strategies.
- Google Chief Scientist Jeff Dean discussed his 25 years at Google, from PageRank to AGI in this video.
- Debate on Model Training Costs: A discussion surfaced regarding the costs involved in training a 1B model, with estimates ranging from thousands to tens of thousands of dollars for a complete setup. While it's feasible to train on consumer GPUs, advancements in data and architectures may affect the overall cost efficiency over time.
- One participant mentioned the possibility of achieving a 1.5B model for around $200 to $1,000 through careful data selection and training strategies, linking to 1.5-Pints Technical Report.
- LLaDA Diffusion Model Challenges LLMs: The LLaDA paper introduces a diffusion model that redefines the landscape of large language models (LLMs) by providing a principled generative approach for probabilistic inference. LLaDA shows strong scalability and competes with leading LLMs.
- LLaDA's remarkable performance in in-context learning and multi-turn dialogue, positions it as a competitive alternative to established models like GPT-4o.
Yannick Kilcher Discord
- Splines Show Promise in AI: Discussions highlighted the potential of using NURBS and splines in AI models for enhanced function approximation and reduced overfitting, contrasting with traditional polynomials, further research can be found in NeuralSVG: An Implicit Representation for Text-to-Vector Generation.
- Participants explored the importance of smoothness and topology in developing new AI methodologies, aiming for better generalization capabilities.
- UltraMem Boosts Inference Speed: The UltraMem architecture, detailed in the paper Ultra-Sparse Memory Network, utilizes large-scale, ultra-sparse memory layers to improve inference speed and model performance.
- This architecture potentially offers advantages over traditional MoE approaches while maintaining computational efficiency.
- Community Seeks Paper Discussion Hosts: Members are actively encouraging increased participation in hosting paper discussions to enrich perspectives within the group, the logistics are discussed in #paper-discussion.
- Suggestions included developing a hierarchical tree of seminal papers with dependency links to streamline understanding and knowledge advancement.
- Mistral's Saba Speaks Many Languages: MistralAI introduced Mistral Saba, a 24 billion parameter regional language model, trained on datasets from the Middle East and South Asia.
- The model excels in Arabic and South Indian languages like Tamil and Malayalam, focusing on regional linguistic nuances.
- OpenAI Drafts Robot Ethics Guide: OpenAI has released its 50 laws of robotics, defining safe AI behaviors and modern AI ethics guidelines, expanding Asimov's original three rules.
- The document aims to align AI models with human values and safety, contributing to ongoing discussions on AI responsibility.
Cohere Discord
- Cohere API Might Get OpenAI Compatibility: A member inquired about Cohere API compatibility with OpenAI which received feedback and referral to the product teams, with no definitive timeline given.
- A separate member is building a deep research clone compatible with various AI models, requiring only the base URL and model specification for LLM provider initialization.
- Community Debates Moderation Models: Members debated the effectiveness of moderation models, expressing dissatisfaction with Llamaguard and seeking better alternatives for their specific use cases.
- Some are using OpenAI's omni moderation but hope that Cohere releases new models soon.
- Troubleshooting Technical Glitches: One member reported a ModuleNotFoundError for the cohere package after a fresh install, which was suggested to be an environment issue.
- Another user reported a log in error and shared a screenshot, anticipating community assistance given the unclear error context.
- Aya-8b Integration Stumbles Upon Issues: A member integrating the Cohere Aya-8b model into their application encountered API support problems for the specific model version.
- A sample request using the c4ai-aya-expanse-8b model was shared, with reference to relevant resources for troubleshooting, while other members experienced timeout issues with the Rerank API.
- Embed API Rate Limits: The Embed API production rate limit is 2,000 calls per minute, regardless of the number of tokens embedded, effectively limiting it to 2000 documents total per minute.
- Users humorously tested the AI for color preferences, but the AI stated that it has no personal preferences.
Latent Space Discord
- Perplexity Debuts Deep Research Agent: The new Perplexity Deep Research Agent provides a free tier and a $20/month expert researcher option, generating reports in about three minutes.
- User feedback highlights decent output but suggests follow-on capabilities could better 'leverage deep research', with more potential improvements to be made.
- Musk Teases Grok 3 Release: Elon Musk announced the upcoming Grok 3 launch with a live demo, boldly claiming it to be the 'smartest AI on Earth', but skepticism remains.
- Community members expressed mixed reactions, balancing enthusiasm with caution given the competitive landscape and Grok's previous performance issues.
- StepFun Opens Multimodal Models: StepFun has open-sourced Step-Video-T2V, a 30B text-to-video model needing 80G VRAM, and their Step-Audio-Chat model which is available on Hugging Face.
- Evaluations show that the audio chat model strongly outperforms competitors, highlighting enhanced capabilities in multimodal AI.
- Zed Predicts Next Edit: Zed introduced a new edit prediction model called Zeta that is designed to boost developer productivity by anticipating code edits.
- Although the community appreciates open-source initiatives, some question if Zeta will rival established tools like Cursor and Copilot in terms of performance.
- Eliza Inspires Nostalgia: Members recalled Eliza's impact as an early AI therapist and discussed a related podcast episode.
- Participants speculated on its potential impact if released today, referencing $100B in funding as a hypothetical, inspired by Gary Marcus's remark.
MCP (Glama) Discord
- MCP Servers Enhance Workflows: Members discussed several MCP servers, including those for sequential thinking, filesystem access, web search, and GitHub repository evaluation.
- These tools provide enhanced functionalities to streamline various tasks and improve efficiency.
- Glama Website Experiences Outage: Users reported issues accessing the Glama website due to a network outage in the IAD region, where it is hosted.
- The issues, including slow loading times and timeouts, were eventually resolved.
- Prompt Tool Recommendations Shared: A user sought recommendations for prompt management tools in MCP environments; suggestions included LLMling, MCP-llms-txt for documentation, and Langfuse.
- These tools aim to assist with better prompting and workflow maintenance.
- SSE Wrappers Development Challenges: Developers discussed the challenges of creating SSE wrappers for MCP servers, particularly concerning communication without modifying existing server code.
- Neon's MCP Server Simplifies DB Management: Daniel from Neon introduced Neon's MCP Server that allows users to manage databases using natural language via the Neon API.
- This simplifies database workflows by using Large Language Models (LLMs) for interaction.
Modular (Mojo 🔥) Discord
- MAX Could Simplify NumPy: A member speculated that a NumPy written in Mojo would be simpler than the C version by outsourcing the work to MAX.
- Another member agreed that this approach could lead to significant improvements in efficiency and maintainability.
- Community Courts Polars: A user requested the Discord link for Polars, a DataFrame library powered by Rust.
- The prompt response confirms Polars' relevance and existing awareness within the Mojo community.
- Mojo Eyes GPU, Including Metal: Discussion arose around Mojo's plans to compile code for GPUs, with potential support for Apple's Metal API.
- The ultimate strategy involves implementing MAX drivers for a wide range of hardware, including Apple Silicon, opening doors for broader compatibility.
- AI Assistants Join Mojo Refactoring: A user explored using AI to refactor a large Python project into Mojo, particularly for smaller classes and for its reasonable Mojo knowledge.
- Another member recommended experimenting with
Gemini 2.0 Flash
variants, but advised that fully automating the process would be difficult.
- Another member recommended experimenting with
- Mojo Needs Error Clarity: Users emphasized the need to improve the clarity of Mojo's error messages to aid in navigating the language.
- One member pointed out that Mojo's borrow checker adds extra complexity when transitioning code from other languages, often necessitating a complete re-architecture.
Torchtune Discord
- FSDP steps in for Deepspeed Zero: A member inquired about using Deepspeed Zero with Torchtune, but ebsmothers clarified they use FSDP which is equivalent to ZeRO-3.
- Another member suggested using distributed recipes for capabilities associated with Zero without extra setup.
- GRPO PR Needs Some TLC: Participants discussed the progress of the GRPO PR, noting the need for unit tests and an experimental components folder to house various datasets (see PR #2398).
- Emphasis was placed on maintaining backward compatibility while making generation changes.
- Tool Role Enters Llama3.3: A consensus formed around adding a 'tool' option to the Llama3 tokenizer, providing backward compatibility with 'ipython' (see tokenizer.py).
- Contributors actively debated the implications of introducing a new tokenizer builder and meticulously handling model version checks.
- Dependency Concerns Inflate Dev Environment: Members expressed concerns about the growing list of optional dependencies in the dev environment, particularly those related to logging frameworks.
- A proposal was made to categorize dependencies, enabling users to install only the logging framework they need, thereby minimizing unnecessary bloat.
- Clipping/Winsorization Saves RLHF: It was suggested to add clipping or winsorization options to the
rlhf.get_batch_log_probs
function to fix issues with dropping end-of-sequence tokens due to logarithmic behavior.- This was identified as a potential improvement in handling log probabilities.
tinygrad (George Hotz) Discord
- Tensor Operations Flexibilized: Members proposed that
Tensor.tril
andtriu
should accept aTensor
for the diagonal instead of anint
, allowing greater flexibility in custom kernels for blockwise attention in KV-cache-space.- psychofauna advocated for an ideal signature of
int | Tensor
to streamline the management of multiple diagonals across dimensions.
- psychofauna advocated for an ideal signature of
- Tinygrad Bugfix Bounties Kickoff: A user announced a PR for a bugfix bounty, noting the task's simplicity based on their analysis of test failures.
- They suggested that the test in question may represent a fix or a typo, pending further review of recent commits.
- HEVC Decoder Utility Hunt: Following up on the HEVC decoder, a user inquired about utilities for command queue management, referencing the helpfulness of a previously used ioctl sniffer.
- nimlgen mentioned the existence of code for dumping queues, though it may require adjustments for current use.
- Attention Implementation Almost There: A member reported implementing the full
com.microsoft.Attention
by adapting existing code fromextra/models/llama.py
and evaluating against top Hugging Face repositories.- The implementation passed 201 out of 250 tests, with failures primarily due to numerical inaccuracies across various quantization formats.
- Tinychat Fortified for Mobile: A user announced enhancements to Tinychat, specifically targeting stability on WASM for mobile devices, with changes documented in their PR.
- They are currently cleaning up the code in preparation for merging and have invited inquiries about the updates.
LlamaIndex Discord
- LlamaParse Reveals All!: @mesudarshan showcased LlamaParse's capabilities, emphasizing its multiple parsing modes and parsing audio/images, further enhanced by a JSON mode, as detailed here.
- He gave a full breakdown of its features at this link.
- LlamaIndex.TS Slims Down!: LlamaIndex.TS received an update that slims it down to make it easier to ship, greatly enhancing its usability, with improvements detailed here.
- The release makes it faster for developers to integrate LlamaIndex into Typescript based LLM Apps.
- Roll out Sales Outreach Agent with LlamaIndex: A tutorial for an Automated Sales Outreach Agent, which crafts outreach emails and manages meeting schedules based on responses, was shared leveraging @llama_index workflows with demos here and here.
- The agent helps generate outreach emails and schedule meetings.
- LLM Consortium Answers!: Massimiliano Pippi introduced an implementation of @karpathy's concept of an LLM Consortium, allowing multiple LLMs to respond to questions and compare answers, explained more fully here and here.
- The project explores collaborative LLM responses.
- Mistral Saba Small Model Debuts: Mistral AI launched Mistral Saba, a new Arabic-focused model, which LlamaIndex immediately supported, which users can install using
pip install llama-index-llms-mistralai
, more info.- It provides a small model that specializes in Arabic.
Nomic.ai (GPT4All) Discord
- Hunting Nomic Embed Text V2 Source: A member sought the source code and training data for Nomic Embed Text V2 without immediate success, until another member shared the GitHub repository and Hugging Face page.
- The discussion highlighted the importance of clear documentation and discoverability for open-source projects.
- DeepSeek Model Binding Binds Up: A user reported errors loading the DeepSeek-R1-Distill-Qwen-1.5B-Q4_0.gguf model, suggesting issues with potentially outdated python bindings**.
- Another member proposed updating the llama.cpp library to resolve the conflict.
- LocalDoc Data Dumps?: A user questioned if enabling LocalDoc** in GPT4All sends data to the cloud, and how to disable it after initial use.
- The solution involves deselecting the Enable Data Lake setting to prevent data sharing.
- Code Llama Template Tangles: A user requested assistance in configuring a chat template for the Code Llama** model.
- A basic example for a message template was shared, emphasizing the potential need for the model's specific repository or full name for improved support.
- Tooling Around with Models: A member inquired about verifying a model's effective use of tools during execution.
- The advice given was to consult the model developer's documentation to confirm if the model was specifically trained on tool calling.
LLM Agents (Berkeley MOOC) Discord
- Berkeley RDI Internships Seek Talent: The Berkeley Center for Responsible, Decentralized Intelligence (RDI) has openings for internships for UC Berkeley students, with focuses on Marketing and Web Development and applications reviewed on a rolling basis via this link and
samanthaguo@berkeley.edu
.- The roles involve developing marketing strategies, managing social media, enhancing their website, and producing multimedia content for their YouTube channel; skills in design tools, GitHub, and multimedia production are preferred.
- Navigating CS294/194-280 Course Enrollment: Students can enroll in CS194-280 (class number 33840) and CS294-280 (class number 33841) through CalCentral, with a petition form available for those waitlisted.
- While late enrollment is accepted, queries about the course should be directed to Edstem instead of emailing staff.
- DeepSeek Reasoning Discussion is Scheduled: A discussion on DeepSeek's reasoning method and GRPO took place to discuss integrating GRPO style reasoning into smaller models.
- The study group also covered Lecture 3 by Prof Yu Su in the weekly forum.
- FinRL-DeepSeek Webinar Premieres CVaR-PPO Extension: The FinRL-DeepSeek webinar on February 25 will present an extension of CVaR-PPO that incorporates trading recommendations and risk assessments generated by LLMs, with open-source code, data, and AI trading agents provided.
- Backtesting results on the Nasdaq-100 will be shared.
- Quiz 3 Expected Soon: Members inquired about the release date of Quiz 3, which has been delayed due to scheduling issues.
- It is now expected to be available early next week.
DSPy Discord
- DSPy Code Golf Competition Kicks Off: Members discussed engaging in DSPy code golf, suggesting a competitive game around creating concise DSPy solutions and sharing quick hacks.
- A member noted it's a fun excuse to spend some time on neat coding challenges and linked to a tweet by Omar Khattab (@lateinteraction) showcasing DSPy usage.
- Qwen 0.5B Wrangling JSON Output: A member sought guidance on securing JSON outputs for a multi-label classification task using Qwen 0.5B, focusing on the necessary configurations.
- Another member recommended using DSPy's JSONAdapter for proper JSON output formatting and provided example code.
- Custom DSPy Implementation Emerges: A member shared their forked and customized implementation of Parlant with DSPy support, ensuring working JSON outputs.
- They offered an example run that can be integrated into the Parlant server for optimal performance.
- Constant Parameter Placement Pondered: Clarification was requested on whether constant parameters should be included in the signature docstring or as an InputField in the pipeline setup.
- The discussion included considerations for the impact on mipro and bootstrapfewshotrandomsearch optimization processes.
- MIpro Feedback Loop Strategies: A question arose on passing feedback to MIpro regarding incorrect answers, beyond just treating the number as a minor metric.
- The inquiry sought methods for effectively communicating the reasons for incorrect outputs to improve the model's performance.
MLOps @Chipro Discord
- FinRL-DeepSeek Webinar Announced: A webinar on FinRL-DeepSeek Algorithmic Trading has been announced, showcasing an extension of CVaR-PPO that integrates trading recommendations and risk evaluations from LLMs, scheduled for February 25 at 8PM CST, at this registration link.
- The webinar promises access to open-source code, data, trading agents, and backtesting results on the Nasdaq-100, emphasizing the availability of AI agents for deployment.
- Inquiry about Webinar Recording: A member inquired about accessing recordings of the FinRL-DeepSeek webinar.
- This suggests significant interest in the webinar's content, particularly for individuals unable to attend the live session.
The Gorilla LLM (Berkeley Function Calling) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
PART 2: Detailed by-Channel summaries and links
Unsloth AI (Daniel Han) ▷ #general (1141 messages🔥🔥🔥):
Unsloth Hiring Challenges, Learning Resources for AI, Fine-tuning Models, Model Uploads on Hugging Face, Personal Experiences in Tech
- Unsloth Hiring Challenges Announced: Unsloth has announced five challenges with potential salaries reaching up to $500K/year for suitable candidates, emphasizing problem-solving in various technical areas.
- Participants are encouraged to solve problems related to Griffin, FSDP2, and other deep learning optimizations as the company seeks to advance community contributions.
- Effective Learning Resources for AI: Users suggested various methods for learning AI, such as reading source code, academic papers, and utilizing tools like ChatGPT for explanations.
- Various courses, including those by Karpathy, have been recommended, along with hands-on approaches to tackling complex topics in AI.
- Understanding Model Uploads on Hugging Face: There was a discussion about the rationale behind uploading both quantized and original model weights on Hugging Face, with some expressing confusion over the resource implications.
- Users noted instances where models were fixed or enhanced, emphasizing that optimizing original models is a priority for Unsloth.
- Experiences and Perspectives in Tech: Several users shared their personal and academic journeys in tech, with insights on how passion and interest can influence learning and career paths.
- The value of hard work versus natural talent was discussed, highlighting individual experiences in industry and academia.
- Kernel Development and Learning CUDA: Users expressed interest in low-level kernel development, particularly CUDA and Triton, shared their educational background, and discussed internships and job prospects.
- Attention was drawn to ongoing efforts in RL and GPU optimization, with some users seeking to transition into industry roles after graduation.
- Finetune Phi-4 with Unsloth: Fine-tune Microsoft's new Phi-4 model with Unsloth!We've also found & fixed 4 bugs in the model.
- Finetune Llama 3 with Unsloth: Fine-tune Meta's new model Llama 3 easily with 6x longer context lengths via Unsloth!
- Reasoning - GRPO & RL | Unsloth Documentation: Train your own DeepSeek-R1 reasoning model with Unsloth using GRPO which is a part of Reinforcement Learning (RL) fine-tuning.
- Tweet from Daniel Han (@danielhanchen): We made 5 challenges and if you score 47 points we'll offer you $500K/year + equity to join us at 🦥@UnslothAI!No experience or PhD needed.$400K - $500K/yr: Founding Engineer (47 points)$250K - $3...
- archit11/smollm350m-grpo · Hugging Face: no description found
- Beginner? Start here! | Unsloth Documentation: no description found
- Llama 3.2 Fine Tuning for Dummies (with 16k, 32k,... Context): Learn how to fine-tune Meta's Llama 3.2 model using Unsloth in Google Colab. This step-by-step tutorial covers:• Selecting the right Llama 3.2 model from Hug...
- Tweet from Alexander Doria (@Dorialexander): In case it interests anyone, I managed to set up a workable version of @willccbb script on Colab. Replace llama 1B with qwen 0.5b and inference with vllm. Full training in about 2 hours. https://colab...
- Fine-tuning Guide | Unsloth Documentation: Learn all the basics of fine-tuning.
- docs: add training.py usage instructions to README by sardorb3k · Pull Request #102 · Zyphra/Zonos: Add detailed section on training prerequisites and setupDocument configuration options and parametersExplain checkpoint saving and training processInclude example commands for custom training
- GitHub - sardorb3k/Zonos: Zonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expressiveness and quality on par with—or even surpassing—top TTS providers.: Zonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expressiveness and quality on par with—or even surpassing—top TTS ...
- Unsloth Documentation: no description found
- unsloth/DeepSeek-R1 · Hugging Face: no description found
- unsloth/DeepSeek-R1 at main: no description found
- GRPO Environments for custom multi-step rollouts (vLLM-only) by willccbb · Pull Request #2810 · huggingface/trl: What does this PR do?Adds a protocol under trl/environments for an Environment object which wraps vLLM's .generate(...) to allow for custom rollout logic, and an optional env field to the Trai...
Unsloth AI (Daniel Han) ▷ #off-topic (114 messages🔥🔥):
RAG Implementation, Open-Source Tools, AI Agents, Dual CPU Server Builds, Project Digits and Memory
- Exploring RAG Implementation Tools: Members discussed options for implementing RAG, with suggestions for Llama Index, Haystack, and AnythingLLM to streamline processes.
- One noted that getting started might involve significant effort depending on business verticals and expectations.
- AnythingLLM Promotes Unsloth: Users shared positive experiences with AnythingLLM, highlighting its role as a seamless desktop tool for RAG applications.
- Its founder is praised for being an excellent educator and for promoting Unsloth effectively within the community.
- Skepticism Surrounds AI Agents: Discussion arose about the true definition of AI agents, with opinions suggesting they might just be effective prompts rather than groundbreaking technology.
- Members expressed concerns about the hype surrounding AI agents, emphasizing the importance of practicality over marketing language.
- Dual CPU Server Configuration Insights: A member sought advice on building a dual CPU server with 1TB of RAM, focusing on using one CPU for GPU tasks and the other for data processing.
- Thoughts emerged on the challenges of maintaining such a setup and the hidden costs linked to running specialized LLMs.
- Future of Memory with Project Digits: Participants discussed the potential of Project Digits, which will focus on memory capacity over speed, comparing it to Apple's chip designs.
- Members anticipate future advancements in memory technology will enable more effective fine-tuning of LLMs and broader applications.
- llama.cpp/docs/build.md at master · ggml-org/llama.cpp: LLM inference in C/C++. Contribute to ggml-org/llama.cpp development by creating an account on GitHub.
- GitHub - microsoft/markitdown: Python tool for converting files and office documents to Markdown.: Python tool for converting files and office documents to Markdown. - microsoft/markitdown
Unsloth AI (Daniel Han) ▷ #help (305 messages🔥🔥):
CUDA OOM Issues, Fine-tuning Large Models, Continued Pretraining, Unsloth Updates, Model Format for Q&A
- Troubleshooting CUDA Out of Memory Errors: Members discussed experiencing CUDA out-of-memory (OOM) errors mid-training, particularly after a week off, highlighting possible data imbalance as a cause.
- Some noted the difficulty of resolving OOM issues despite using similar settings and noted that even data variance could lead to a spike in VRAM load.
- Fine-tuning Large Models on New Data: Participants shared experiences while adapting and fine-tuning models like Llama and Mistral, sometimes encountering unexpected inference behaviors.
- The conversation highlighted the importance of using well-formatted datasets for training, specifically emphasizing the necessity for question-answer formats.
- Continued Pretraining Considerations: Discussions around the process of continued pretraining for adapting models to new languages or tasks were emphasized, with suggestions for checking the model's learned data.
- Members directed others towards relevant resources, such as a specific Unsloth documentation page, for guidance on best practices.
- Unsloth Repository Updates: Concerns about breaking changes after recent updates to the Unsloth repository led to discussions about updating both Unsloth and Torch, with members wary of potential impacts on performance.
- Suggestions were made for implementing flags to prevent unnecessary model downloads if already cached, to enhance user experience.
- Formatting Datasets for Model Training: A user inquired about proper formatting of conversational datasets for Mistral 3 small, indicating confusion about creating the right templates.
- This sparked a dialogue about the complexities of preparing datasets for fine-tuning, with community support offered to address template-related questions.
- Download Ollama on macOS: Download Ollama for macOS
- 🐋 Run DeepSeek R1 Dynamic 1.58-bit with Llama.cpp | Open WebUI: A huge shoutout to UnslothAI for their incredible efforts! Thanks to their hard work, we can now run the full DeepSeek-R1 671B parameter model in its dynamic 1.58-bit quantized form (compressed to jus...
- Google Colab: no description found
- Google Colab: no description found
- Google Colab: no description found
- no title found: no description found
- Google Colab: no description found
- Google Colab: no description found
- Train your own R1 reasoning model locally (GRPO): You can now reproduce your own DeepSeek-R1 reasoning model with Unsloth 100% locally. Using GRPO.Open-source, free and beginner friendly.
- Unsloth Requirements | Unsloth Documentation: Here are Unsloth's requirements including system and GPU VRAM requirements.
- Qwen2.5-VL/cookbooks/ocr.ipynb at main · QwenLM/Qwen2.5-VL: Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud. - QwenLM/Qwen2.5-VL
- Saving to VLLM | Unsloth Documentation: Saving models to 16bit for VLLM
- Reasoning - GRPO & RL | Unsloth Documentation: Train your own DeepSeek-R1 reasoning model with Unsloth using GRPO which is a part of Reinforcement Learning (RL) fine-tuning.
- GitHub - huggingface/smol-course: A course on aligning smol models.: A course on aligning smol models. Contribute to huggingface/smol-course development by creating an account on GitHub.
- Installing + Updating | Unsloth Documentation: Learn to install Unsloth locally or online.
- Reasoning - GRPO & RL | Unsloth Documentation: Train your own DeepSeek-R1 reasoning model with Unsloth using GRPO which is a part of Reinforcement Learning (RL) fine-tuning.
- GitHub - shannonhochkins/ha-component-kit: Designed with developers in mind, this powerful package is built on React to create seamless, highly customizable interfaces for your Home Assistant smart home dashboards.: Designed with developers in mind, this powerful package is built on React to create seamless, highly customizable interfaces for your Home Assistant smart home dashboards. - GitHub - shannonhochki...
- Windows Installation | Unsloth Documentation: See how to install Unsloth on Windows with or without WSL.
- unsloth/DeepSeek-V3-GGUF · Hugging Face: no description found
- GitHub - woct0rdho/triton-windows: Fork of the Triton language and compiler for Windows support: Fork of the Triton language and compiler for Windows support - woct0rdho/triton-windows
- Unsloth overwrites the forward call function of a model loaded by huggingface library · Issue #1713 · unslothai/unsloth: I am trying out the GRPO notebook with a pretrained model as my reward model. Basically, I followed the notebook from this link https://colab.research.google.com/github/unslothai/notebooks/blob/mai...
- Continued Pretraining | Unsloth Documentation: AKA as Continued Finetuning. Unsloth allows you to continually pretrain so a model can learn a new language.
- Welcome | Unsloth Documentation: New to Unsloth?
- Beginner? Start here! | Unsloth Documentation: no description found
Unsloth AI (Daniel Han) ▷ #showcase (102 messages🔥🔥):
Fine Tuning vs RAG, Synthetic Data Generation, LLM Training Strategies, Dynamic Data Challenges, Optimized Reasoning Dataset
- Fine Tuning Takes the Lead over RAG: Members discussed how fine tuning can significantly outperform RAG when done correctly, highlighting an improvement in performance with sufficient training data.
- One member expressed confidence in the future of fine tuning, suggesting it could become the predominant approach in LLM applications, relegating RAG to niche uses.
- Synthetic Data Generation Insights: A member shared their success with synthetic data generation by asking an LLM to generate questions based on given documents, which were then used to create a training dataset.
- This approach was noted to help maintain relevance in the questions while addressing the structure and coverage of the training data.
- Challenges with Dynamic Data: Concerns were raised about how frequently changing data could affect the performance of models when relying solely on fine tuning, underpinning the importance of RAG in those scenarios.
- Members noted that while fine tuning excels, RAG may still be necessary for data that evolves rapidly, requiring a blend of both methods for optimal outcomes.
- Exploration of LLM Training Strategies: Discussions touched on the importance of data quality in fine tuning, with emphasis on gathering diverse and redundant questions for better model understanding.
- It was suggested that adjusting training parameters like learning rate and epochs can influence how well a model learns nuanced data, particularly from smaller datasets.
- New Optimized Reasoning Dataset Released: A new dataset named Optimized_Reasoning was announced, aimed at improving model reasoning abilities while reducing token usage.
- This dataset addresses the performance gaps in modern LLMs when handling reasoning tasks, offering a resource for better training outcomes.
- Lost in the Middle: How Language Models Use Long Contexts: While recent language models have the ability to take long contexts as input, relatively little is known about how well they use longer context. We analyze the performance of language models on two ta...
- Reddit - Dive into anything: no description found
Unsloth AI (Daniel Han) ▷ #research (109 messages🔥🔥):
Coconut LLM, Chess Move Reasoning, DeepSeek Performance, SCION Optimizer, Paper Review Request
- Exploring Coconut LLM's Capabilities: Members discussed the Coconut LLM and its use of GPT-2, considering how newer models might improve performance in reasoning tasks.
- One member noted that results have yet to reflect significant advancements and speculated this might tie into Unsloth's future plans.
- Challenges in Chess Move Reasoning: A member shared their experience training a model for chess move reasoning, highlighting the struggle of LLMs to adequately plan ahead in the complex game.
- Another member pointed out the necessity of crafting a robust reward function to properly guide the model's learning in practical applications.
- DeepSeek's Impressive Progress: The community highlighted the productive work from the DeepSeek team, with one member reporting a successful run yielding 337 correct out of 2000 attempts in handling GitHub PRs.
- This focus on performance indicates ongoing interest in leveraging such models for complex coding tasks and aligns with the group's mission.
- SCION Optimizer Proposal: A new optimizer, SCION, was suggested, which could potentially train large neural networks without using Adam, requiring less memory while improving results.
- This sparked discussions about the practical applications of SCION in model training while managing computational resources efficiently.
- Seeking Feedback on AI Research Paper: A member requested feedback on their paper about scaling Mixture of Experts (MoE) models, reaching out to local professors for assistance.
- They are actively engaging with several academics for reviews, demonstrating the community's collaborative approach to improving research outcomes.
- rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking: We present rStar-Math to demonstrate that small language models (SLMs) can rival or even surpass the math reasoning capability of OpenAI o1, without distillation from superior models. rStar-Math achie...
- The Hyperfitting Phenomenon: Sharpening and Stabilizing LLMs for Open-Ended Text Generation: This paper introduces the counter-intuitive generalization results of overfitting pre-trained large language models (LLMs) on very small datasets. In the setting of open-ended text generation, it is w...
- Tweet from Volkan Cevher (@CevherLIONS): 🔥 Want to train large neural networks WITHOUT Adam while using less memory and getting better results? ⚡Check out SCION: a new optimizer that adapts to the geometry of your problem using norm-constra...
- Tweet from Volkan Cevher (@CevherLIONS): 🔥 Want to train large neural networks WITHOUT Adam while using less memory and getting better results? ⚡Check out SCION: a new optimizer that adapts to the geometry of your problem using norm-constra...
- hemingway-GRPO/s1_grpo_trainer.py at main · NoahBPeterson/hemingway-GRPO: Contribute to NoahBPeterson/hemingway-GRPO development by creating an account on GitHub.
- GitHub - facebookresearch/coconut: Training Large Language Model to Reason in a Continuous Latent Space: Training Large Language Model to Reason in a Continuous Latent Space - facebookresearch/coconut
- GitHub - hkust-nlp/CodeIO: CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction: CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction - hkust-nlp/CodeIO
- [Project] Training Agents with GRPO · Issue #2723 · huggingface/trl: Let's discuss how to train agents using GRPO. Here, I will link sub-issues related to various problems, features, or questions that need resolution for implementing this idea.
- Brave Search: Search the Web. Privately. Truly useful results, AI-powered answers, & more. All from an independent index. No profiling, no bias, no Big Tech.
- GRPO Environments for custom multi-step rollouts (vLLM-only) by willccbb · Pull Request #2810 · huggingface/trl: What does this PR do?Adds a protocol under trl/environments for an Environment object which wraps vLLM's .generate(...) to allow for custom rollout logic, and an optional env field to the Trai...
Perplexity AI ▷ #announcements (1 messages):
Deep Research, Query Limits, Platform Availability
- Perplexity launches Deep Research: Deep Research allows users to generate in-depth research reports autonomously by performing multiple searches, reading sources, and delivering comprehensive results for free—with limits of 5 queries per day for non-subscribers.
- It excels in expert-level tasks and has been benchmarked high on Humanity’s Last Exam.
- Deep Research now available on Web: Deep Research is available to users on the Web starting today and will also roll out to iOS, Android, and Mac soon.
- Users are encouraged to update their apps to the latest version to access this functionality.
- Pro users enjoy enhanced query limits: Pro users can enjoy a significantly greater limit of 500 queries per day, compared to the 5 queries per day available to non-subscribers.
- This tiered access offers more in-depth research opportunities for those who opt for the Pro subscription.
- How to utilize Deep Research effectively: To try Deep Research, users should select “Deep Research” from the mode selector in the search box before submitting their query.
- Detailed instructions on its features can be found here.
Link mentioned: no title found: no description found
Perplexity AI ▷ #general (756 messages🔥🔥🔥):
Perplexity Pro Features, Deep Research vs OpenAI Models, Use Cases for Perplexity, Billing and Subscription Queries, Technical Issues and Feedback
- Comparison of Perplexity Pro and You.com: Users highlighted their experiences with Perplexity Pro and You.com, particularly focusing on the pricing and access to various AI models.
- While Perplexity offers deep research capabilities, some users expressed interest in the more affordable and potentially unlimited access model provided by You.com.
- Deep Research's Limitations: Many users noted the limitations of Perplexity's Deep Research, often comparing it unfavorably to OpenAI's offerings, with reports of frequent hallucinations in responses.
- Despite these issues, there is optimism that improvements will come as Perplexity continues to develop its features.
- Use Cases for Perplexity: Users mentioned using Perplexity for a variety of tasks such as research, coding assistance, and managing mental health conversations, showing its versatility.
- The inclusion of features like deep research and R1 made it a strong tool for numerous applications, particularly for students and professionals.
- Subscription and Billing Problems: Some users raised concerns regarding subscription issues, such as not being able to access their Pro roles or encountering bugs when switching models.
- Inquiries about promotional codes and verification of pro features sparked discussions about service accessibility and customer support.
- Technical Glitches and User Feedback: Users reported technical glitches in the Perplexity app, including issues with voice mode and Deep Research functionality, and expressed frustration over poor user experiences.
- Feedback was collected on the overall experience and capabilities of the app, indicating a desire for improvements and better integration of features.
- Picker | Apple Developer Documentation: A control for selecting from a set of mutually exclusive values.
- Toggle | Apple Developer Documentation: A control that toggles between on and off states.
- Reddit | Markdown Guide: Reddit is a popular online community that supports posting in Markdown.
- College students used Meta’s smart glasses to dox people in real time: Privacy will always be a major concern with smart glasses.
- Access the most powerful AI models in one place on you.com: You.com now offers a first-of-its-kind Custom Model Selector, letting users access, test, and compare large language models (LLMs), such as GPT-4, Claude Instant, Gemini Pro, and more, all in one plac...
- College students used Meta’s smart glasses to dox people in real time: Privacy will always be a major concern with smart glasses.
- Complexity: An enhanced version of Perplexity.ai that everyone has ever wanted.
- Perplexity Mermaid Viewer - Chrome Web Store: Perplexity上のMermaid図を検出し、ポップアップでプレビュー表示をする拡張機能
- Izuku Midoriya GIF - Izuku Midoriya Smash - Discover & Share GIFs: Click to view the GIF
- Google Testing Blog: Introducing "Testing on the Toilet" : no description found
- Tweet from Derya Unutmaz, MD (@DeryaTR_): To compare Perplexity Deep Research (PDR) with OpenAI Deep Research (ODR), I uploaded dozens of genes that we had identified as differentially expressed in certain immune cell subtypes & then used the...
- Soviet Union Edit, but Its much BETTER !!!: Edited by OvertrackerMusic is Slowed downI don't own the right for the Video or MusicCredits:ONIMXRU x SMITHMANE "SHADOW" :https://www.youtube.com/watch?v=jq...
- Perplexity lets you try DeepSeek R1 - without the security risk: Here are two ways to try R1 without exposing your data to foreign servers.
- no title found: no description found
Perplexity AI ▷ #sharing (47 messages🔥):
Perplexity AI Updates, Cultural and Sonic Legacy, DeepSeek Impact on AI, Substack Explanation, Ethereum Spectra Upgrade
- Perplexity AI announces updates: Perplexity AI has rolled out several updates, including in-car ads by Jeep and a secret UFO office by the FBI, revealed in a YouTube video.
- Cultural and sonic legacies were also discussed, highlighting the interplay between technology and artistic expression.
- DeepSeek's influence on AI discussed: Members ponder the implications of DeepSeek on AI, exploring how it could reshape current methodologies.
- Opinions varied, with some emphasizing potential benefits in AI depth and others raising concerns about integration challenges.
- Substack's functionality explained: A member sought clarity on how Substack operates, aiming to leverage its platform for content creation.
- The discussion emphasized the unique features of Substack that differentiate it from traditional blogging.
- Ethereum's Spectra upgrade highlighted: The recent Ethereum Spectra upgrade was shared, showcasing key changes and improvements to the network.
- Participants discussed potential implications for dApps and the broader Ethereum ecosystem.
- Cultural and sonic legacy explored: Members discussed a recent analysis on the cultural and sonic legacy, with insights featured in Perplexity AI's page.
- The conversation centered on the effects of technological advances on cultural expression and sonic innovations.
Link mentioned: YouTube: no description found
Perplexity AI ▷ #pplx-api (18 messages🔥):
Perplexity API invoices, Deep Research availability, Sonar API citations issue, Image handling in Perplexity API, Using Deep Research model
- Request for Perplexity API Invoices: Several members expressed concerns about not receiving responses from customer service regarding invoices for API credits after purchase.
- One member shared a manual to find invoices by clicking on 'View Dashboard' and locating 'Invoice History' after a suggestion from another user.
- Inquiry on Deep Research in API: Members questioned whether the Deep Research feature would be accessible through the API, raising interest in its potential enhancements.
- Others confirmed its current unavailability via the API and suggested checking the documentation for further clarification.
- Citations Missing in API Responses: Concerns were raised regarding the Sonar API failing to return citations as expected, despite including references in the output.
- One user clarified that citations exist separately and provided an example of the output structure, highlighting the need to access them correctly.
- Image Handling in the API: A member inquired about image support within the Perplexity API, only to discover that text input is currently the only supported format.
- This led to discussions about enhancing the documentation to reflect the API's capabilities and limitations.
- Setting Deep Research as Default in API: A user sought assistance on how to set the Deep Research model as the default in their API usage.
- Responses indicated the model isn't available for API use, referencing the documentation for user guidance.
- no title found: no description found
- Perplexity: Learn how to use Perplexity's Sonar API with the AI SDK.
- Perplexity Sonar Pro API as an AI search engine in external applications and tools – for smart apps and tailored search: 🚀 Integrate the Perplexity Sonar Pro API into your tools for intelligent search solutions. 🔎 Optimize your applications with AI-based search for precise results. 🛠️ Crea...
- *NEW* Perplexity.AI Document Upload: Unleashing the Power of Document Upload through perplexity.ai Latest Feature.perplexity.Ai has added the ability to upload a document to use in answering que...
HuggingFace ▷ #announcements (1 messages):
Goedel theorem prover, GLiNER-MoE-MultiLingual, Kyro-1, Science News Bot, Mini PyTorch
- Goedel theorem prover launch: The Goedel theorem prover has been introduced on Hugging Face, showcasing its unique capabilities in mathematical reasoning. You can explore it here.
- GLiNER-MoE-MultiLingual NER model: GLiNER-MoE-MultiLingual is a zero-shot Named Entity Recognition model trained for one epoch using NOMIC-MOE architecture, highlighting advanced NER capabilities. More details can be found here.
- Kyro-1: Open-Neo's reasoning model: Kyro-1 marks Open-Neo's first reasoning model, demonstrating strong performance in light to medium reasoning tasks. For more information, check it out here.
- Science News Bot is here!: The Science News Bot has launched, ready to deliver the latest in scientific advancements and research. You can find the project on GitHub.
- Mini PyTorch brings simplicity: Mini PyTorch has been released, providing a lightweight alternative for PyTorch enthusiasts and developers. Explore its functionalities here.
HuggingFace ▷ #general (327 messages🔥🔥):
Hugging Face Model Usage, Model Performance, Training and Fine-tuning LLMs, Technical Issues and Errors, AI Agent Development
- Hugging Face Model Usage Queries: Users discussed which models to use locally and the requirements for specific Hugging Face models, such as needing 20GB VRAM for the Llama 3.2 model.
- Some users explored quantization options to fit larger models on GPUs with lower VRAM.
- Model Performance Insights: Discussion arose around the performance of a 9B model outperforming larger models, highlighting impressive results for its size.
- Metrics for model performance were exchanged, showcasing the capabilities of various models.
- Training and Fine-tuning LLMs: There were inquiries regarding training the Google Flan-t5-base model using AutoTrain, with questions about expected training time.
- Users shared insights on how fine-tuning and quantization contrast with model adapters in terms of implementation and efficiency.
- Technical Issues and Errors: Several users reported receiving 500 internal errors on the Hugging Face website when accessing certain resources.
- Advice was shared about troubleshooting these issues by creating new Spaces and re-uploading source code.
- AI Agent Development Discussion: Users discussed the implementation of an agentic code editor into a Discord bot, looking for features like CLI integration and scriptable interfaces.
- Recommendations were made for tools satisfying these requirements while emphasizing ease of use without complex interfaces.
- AutoTrain – Hugging Face: no description found
- EasyQZ: no description found
- tomg-group-umd/huginn-0125 · Hugging Face: no description found
- YuE - a Hugging Face Space by innova-ai: no description found
- Tasks - Hugging Face: no description found
- Hmmm Thinking GIF - Hmmm Thinking Batman - Discover & Share GIFs: Click to view the GIF
- leafspark/Llama-3.2-11B-Vision-Instruct-GGUF at main: no description found
- GitHub - instructor-ai/instructor: structured outputs for llms: structured outputs for llms . Contribute to instructor-ai/instructor development by creating an account on GitHub.
- meta-llama/Llama-3.2-11B-Vision-Instruct · Hugging Face: no description found
- Building / preparing space takes forever: My space either builds forever or prepares forever: There is no warning/error logging or other notification. Several delete and create attempts result in the same issue. Moving it to paid plan also...
- google/owlv2-base-patch16-ensemble · Hugging Face: no description found
- GitHub - deepbeepmeep/YuEGP: YuE: Open Full-song Generation Foundation for the GPU Poor: YuE: Open Full-song Generation Foundation for the GPU Poor - deepbeepmeep/YuEGP
- LLM Chronicles #6.4: LLM Agents with ReAct (Reason + Act): In this episode we’ll cover LLM agents, focusing on the core research that helped to improve LLMs’ reasoning while allowing them to interact with the externa...
- GitHub - SadilKhan/Text2CAD: [NeurIPS'24 Spotlight] Text2CAD: Generating Sequential CAD Designs from Beginner-to-Expert Level Text Prompts: [NeurIPS'24 Spotlight] Text2CAD: Generating Sequential CAD Designs from Beginner-to-Expert Level Text Prompts - SadilKhan/Text2CAD
- transcribe.py: GitHub Gist: instantly share code, notes, and snippets.
- Easy_training/Galore+Qlora_With_Multi_GPU_Support at main · rombodawg/Easy_training: Contribute to rombodawg/Easy_training development by creating an account on GitHub.
- YuEのインストール方法と基本的な使い方: YuEは、音楽を自動生成するために開発された最先端のオープンソースAIモデルです。特に歌詞からフル楽曲を生成することに特化しており、ボーカルと伴奏を含む本格的な楽曲を数分間にわたって作り出すことができます。
HuggingFace ▷ #today-im-learning (4 messages):
NVIDIA Course, Generative AI Tutorials, Differential Privacy
- Inquiry about NVIDIA Course: A member queried whether a particular program is an NVIDIA course and if it is available for free.
- No additional specific information was provided regarding the course itself.
- Searching for Generative AI Resources: A new member, a 3D graphics programmer, is seeking recommendations for courses or tutorials focused on generative AI.
- They specifically mentioned a preference for resources that include solid coding sessions for model training and deployment.
- Differential Privacy Blog Series Launch: A member started a series of blogs dedicated to Differential Privacy, kicking off with an introductory post titled Differential Privacy!! But Why? available here.
- The author emphasizes the critical balance between extracting insights from data and protecting individual privacy in various contexts.
Link mentioned: Differential Privacy!! But Why?: Blog #1 in the series of Inception of Differential Privacy
HuggingFace ▷ #cool-finds (1 messages):
pablo.ce: https://www.youtube.com/watch?v=03jYz0ijbUU
HuggingFace ▷ #i-made-this (15 messages🔥):
AI-Powered Quizzes, Illustration Text to Video, GLiNER-MoE-MultiLingual, SwarmFormer Optimization, DeepNeo Model
- AI Quizzes Transform Learning: An article discusses how EasyQZ utilizes AI to create engaging quizzes, revolutionizing learning experiences.
- The platform promises to enhance educational assessments through interactive quiz formats.
- Anime-Themed Video Creation Simplified: A new project called Illustration Text to Video enables users to create short anime-themed videos using HunyuanVideo and a popular 2D-art LORA.
- A sample output video was shared showcasing the tool's capabilities in generating animated content.
- Zero-Shot NER with GLiNER-MoE-MultiLingual: An exploration into the GLiNER-MoE-MultiLingual model, which successfully performs zero-shot NER tasks in several languages after just one training epoch.
- Results demonstrate competitive performance compared to English-only models, with a detailed overview of optimization strategies shared.
- Optimizing SwarmFormer for Efficiency: Improvements made to SwarmFormer include local windowed attention and token-to-cluster gating, enhancing computational efficiency.
- The ongoing experiments showcase significant reductions in FLOPs, making the model suitable for mobile and edge environments.
- DeepNeo Model for Reasoning Tasks: The DeepNeo model is introduced as a hybrid model facilitating both normal and agentic reasoning applications.
- It draws inspiration from DeepHermes, providing flexibility in reasoning tasks.
- Illustration Text To Video - a Hugging Face Space by Sergidev: no description found
- DeepNeo-1 - a open-neo Collection: no description found
- SDXL To Diffusers - a Hugging Face Space by EarthnDusk: no description found
- Mayank6255/GLiNER-MoE-MultiLingual · Hugging Face: no description found
- Reddit - Dive into anything: no description found
- minchyeom/Starlette-1-3B · Hugging Face: no description found
- Dynamic SwarmFormer: Explore and run machine learning code with Kaggle Notebooks | Using data from No attached data sources
- Mayank6255/SwarmFormer-small-ef · Hugging Face: no description found
- GitHub - louisbrulenaudet/agentic-market-tool: Barebones tool built upon Hugging Face smolagents and Alpaca for financial analysis automation 🤗: Barebones tool built upon Hugging Face smolagents and Alpaca for financial analysis automation 🤗 - louisbrulenaudet/agentic-market-tool
HuggingFace ▷ #reading-group (1 messages):
arpitbansal.: definitely not the right channel or even right server
HuggingFace ▷ #computer-vision (13 messages🔥):
Alzheimer's detection with MRI, AI controlling applications, Computer Vision Automation Framework, Performance improvements in vision systems, M.A.R.K. AI Assistant
- Alzheimer's Detection Using OASIS Dataset: A member is working on Alzheimer's detection using MRI scans and utilizing the OASIS dataset for the task.
- They are seeking others with experience in this area for guidance.
- AI Mimics Human Actions on App: A demo video showcases an AI utilizing computer vision to interact with the Galculator app, including mapping button locations.
- The approach uses a test sequence in Python to automate human-like actions, demonstrating its capabilities.
- Enhancements in Computer Vision Framework: Performance enhancements are being implemented in a vision system to transition from 1 prediction every 5 seconds to 3 predictions per second.
- Improvements include reducing image size by 1.2GB, making the application spawn nearly instantaneously.
- Collaboration on AI-M.A.R.K.: Members discussed building the 10 Minute M.A.R.K. (Management Assistance and Reporting Kinetics) AI that performs tasks traditionally done by humans.
- There is an interest in sharing tests and code contributions via pull requests to enhance the framework.
- Sharing Tests and Contributions: A member expressed willingness to contribute their tests to the CVAF repository, suggesting possibly a separate branch for their changes.
- They provided a link to their GitHub repository showcasing their work on the project.
Link mentioned: GitHub - OIEIEIO/CVAF at oieieio: Computer-vision based automation framework (name is a WIP) - GitHub - OIEIEIO/CVAF at oieieio
HuggingFace ▷ #NLP (5 messages):
Scaling up Test-Time Compute, Multi-Task Text Generation Model Training, Fine-Tuning Qwen Models, Function Calling Model Tuning
- Exploration of Latent Reasoning Approach: A member asked if anyone has experimented with the model from the paper Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach. This approach focuses on enhancing computational efficiency during testing phases.
- Has anyone tried this model? A call for shared experiences and techniques ensued.
- Guidance Needed for Multi-Task Text Generation: A member is seeking guidance on training a multi-task text generation model for tasks like translation and writing assistance. They question the best way to structure their dataset and seek effective finetuning techniques beyond LoRA.
- Suggestions for dataset structuring and data sources were invited, with an emphasis on preventing catastrophic forgetting during training.
- Review Requested for Qwen 3B Model: A member prompted for a check on the re-uploaded Qwen model and inquired about testing the original Qwen 3B model. They shared a link for their Qwen 2.5 free Google Colab.
- They also provided links to additional Qwen 2.5 notebooks for conversational style fine-tuning.
- Fine-Tuning for Function Calling Model: A member raised a concern about fine-tuning a function calling model to provide general answers when a function isn't available. They wish to maintain the model's ability to call functions while adapting its responses.
- The conversation revolves around ensuring that the model can still effectively serve user inquiries despite potential function limitations.
Link mentioned: unsloth/Qwen2.5-3B-bnb-4bit · Hugging Face: no description found
HuggingFace ▷ #smol-course (18 messages🔥):
Token Creation Confusion, Need for Detailed Video Explanation, Issues with Meta's LLama 3.2 Access, Looping in 'Think -> Action -> Observation', Integration of Finetuning Chapters into NLP Course
- Confusion Over Token Creation: A member expressed frustration about creating a token due to the number of boxes to tick, feeling overwhelmed and seeking assistance.
- Another member mentioned that they didn't need to tick any boxes for it to work on Google Colab, suggesting simplification.
- Request for Detailed Video on Dummy Agent Code: A member requested a 10-15 minute video explanation of the code in the 'Dummy Agent library' session, stating that the course explanation was insufficient.
- In response, another member advised running the provided codes in the notebook where further explanations were already included.
- Access Denied for Meta's LLama 3.2 Models: One user reported that their request for access to Meta's LLama 3.2 models was rejected, likely due to their country.
- Another user confirmed rejection in the US, suggesting a potential update to courses using alternative models.
- Benefits of 'Think -> Action -> Observation' Loop: A member queried the effectiveness of the 'Think -> Action -> Observation' loop, noting their agent merely repeats actions without achieving tasks.
- Another user replied that the cycle should allow for self-correction if utilized correctly, prompting for examples.
- New Finetuning Chapters Available in NLP Course: An announcement was made that finetuning chapters have been integrated into the Hugging Face NLP course, introducing quizzes and interactive exercises.
- A link to the new chapters was shared, inviting participants to explore enhanced learning resources.
- nlp-course (Hugging Face NLP Course): no description found
- SmolVLM Grows Smaller – Introducing the 256M & 500M Models!: no description found
HuggingFace ▷ #agents-course (772 messages🔥🔥🔥):
Course Completion Certificates, FAQs on Course Access, Troubleshooting Submission Issues, Upcoming Units and Content, Community Engagement
- Certificates Granted After Unit Completion: Many participants celebrated completing Unit 1 and successfully obtaining their course certificates, sharing their achievements with the community.
- However, a few users expressed confusion over accessing their certificates and the corresponding credentials link.
- Accessing the Course Materials: New learners inquired about starting the course and accessing Unit 1, with many being directed to the official Hugging Face page for instructions.
- The course was emphasized as self-paced, encouraging users to engage with the materials at their own convenience.
- Technical Issues with Exams and Tools: Users reported issues with submitting quizzes and using customized tools within the course, indicating a need for additional support.
- Discussions arose regarding troubleshooting steps such as re-logging and managing settings in Hugging Face accounts to resolve these issues.
- Community Networking and Events: Participants expressed eagerness to collaborate and network, with some proposing the formation of study groups for collective learning.
- Announcements were made regarding an upcoming hackathon event aimed at fostering community engagement and providing hands-on opportunities.
- Course Schedule and Expectations: Inquiries about the timeline for the next unit revealed that bonus materials would be released soon, with subsequent units expected to follow periodically.
- Participants expressed enthusiasm for learning more about AI agents and were encouraged to remain patient and engaged as the course progresses.
- Introduction to Agents - Hugging Face Agents Course: no description found
- Cisco U.: no description found
- Welcome to the 🤗 AI Agents Course - Hugging Face Agents Course: no description found
- Welcome to the 🤗 AI Agents Course - Hugging Face Agents Course: no description found
- When will the next units be published? - Hugging Face Agents Course: no description found
- Introduction to Agents - Hugging Face Agents Course: no description found
- Let’s Create Our First Agent Using smolagents - Hugging Face Agents Course: no description found
- Hugging Face – The AI community building the future.: no description found
- no title found: no description found
- Live 1: How the Course Works and First Q&A - Hugging Face Agents Course: no description found
- Ai Academy : Deep Learning: 80 Days of AI Mastery: Learn Deep Learning Day by Day:Master artificial intelligence and machine learning at your own pace with Day-by-Day Deep Learning—a dynamic and interactive learning app designe...
- Twitch: Live Streaming: Twitch is where thousands of communities come together for our favorite streamers, for the games we love, for the lulz, for each other, for whatever. Download Twitch and join millions enjoying live g...
- How Scaling Laws Drive Smarter, More Powerful AI: AI scaling laws describe how model performance improves as the size of training data, model parameters or computational resources increases.
- dummy_agent_library.ipynb · agents-course/notebooks at main: no description found
- Steam Gift Activation: no description found
- meta-llama/Llama-3.3-70B-Instruct · Hugging Face: no description found
- theRealProHacker - Overview: Up for anything UI/UX | AI | NLP | Games | QC - theRealProHacker
- fix: add missing variable declarations by rhanb · Pull Request #154 · huggingface/agents-course: when following this part of the course in a local environment, it wasn't clear what what the variables SYSTEM_PROMPT and prompt were refering to
- Data ingestion with Hugging Face datasets: Data ingestion with Hugging Face datasets. GitHub Gist: instantly share code, notes, and snippets.
HuggingFace ▷ #open-r1 (1 messages):
Distilabel Error, Data Generation with Distilled R1 Model
- AssertionError with Cache in Distilabel: A user encountered an AssertionError stating 'Cannot have cache with result_handler not alive' while trying to generate data from the distilled R1 model using Distilabel.
- Stuck for 20 minutes, they are seeking help with their implementation and troubleshooting the issue.
- Code Snippet Shared for Debugging: The user shared a code snippet where they load a dataset and set up a Pipeline for text generation with a model ID of DeepSeek-R1-Distill-Qwen-1.5B.
- The provided code includes configurations for the vLLM model and a TextGeneration step, which aims to generate data using the specified parameters.
Codeium (Windsurf) ▷ #announcements (1 messages):
Windsurf Patch Release 1.3.4, Model Context Protocol Introduction, Custom App Icons Feature, Message Cancellation Fixes, Cascade Base Credit Handling
- Windsurf 1.3.4 Patch Released: The Windsurf 1.3.4 patch has been released, addressing bugs related to Cascade write tools, message cancellation, and more. You can read the full changelog here (February 14, 2025).
- Bug fixes include enhancements to Cascade Base credit handling and improved auth login error messaging.
- Model Context Protocol Now Supported: Cascade now supports the Model Context Protocol (MCP), allowing users to make tool calls to configured MCP servers. Users can easily set up MCP by clicking the hammer icon in the Cascade input toolbar.
- This feature is available to all individual plans, with each MCP tool call costing one flow act.
- Message Cancellation and Auth Error Fixes: The patch includes various fixes around user message cancellation and better error messaging for authentication entry. These improvements ensure smoother user experiences when handling errors.
- Cascade Base Credit Handling Improvements: Improvements to Cascade Base trigger when users are low on credits, enhancing overall usability. The system catches error cases more effectively, reducing user confusion.
Link mentioned: Windsurf Editor Changelogs | Windsurf Editor and Codeium extensions: Latest updates and changes for the Windsurf Editor.
Codeium (Windsurf) ▷ #discussion (50 messages🔥):
Extension Updates for JetBrains, Windsurf vs Codeium Extension, Login Issues with Windsurf, Codeium Chrome Extension Queries, VS Code Extension Crashes
- Expectations for JetBrains Extension Updates: Members expressed desires for updates to the JetBrains extension, particularly to refresh the model list for tools like DeepSeek and Gemini 2.0 Flash, which could deliver significant value with minimal effort.
- A response indicated that updates are indeed planned for JetBrains, highlighting ongoing maintenance for Codeium extensions due to enterprise user needs.
- Debate on Windsurf vs Codeium Extension: Users noted frustration regarding the lack of feature parity between the Windsurf and Codeium extensions, with some feeling that Windsurf has overshadowed the extension.
- Critiques emerged regarding the perceived abandonment of the extension, with alternatives like Qodo being mentioned favorably in comparison.
- User Challenges with Windsurf Login: Several users reported issues accessing Windsurf, including problems after attempting to use an mcp file and the login system going down.
- Assistance was directed towards appropriate channels for support, emphasizing the distinction between Windsurf and Codeium extension queries.
- Questions about Chrome Extension Usage: A user inquired whether the Codeium Chrome extension could assist in passing assessments on CodeSignal, indicating a lack of support from the extension in current assessments.
- Others suggested checking the official documentation for installation and usage guidance, prompting further inquiries on its performance.
- Frequent Crashes of VS Code Extension: Numerous users reported issues with the VS Code extension crashing frequently, requiring reloads to restore functionality, which has been a consistent problem.
- Variations of similar issues included error messages related to execution deadlines, prompting discussions on possible fixes.
Link mentioned: Chrome Tutorial | Windsurf Editor and Codeium extensions: Codeium is the AI code assistant platform that developers love and enterprises trust. Also the builders of Windsurf, the first agentic IDE.
Codeium (Windsurf) ▷ #windsurf (580 messages🔥🔥🔥):
Windsurf Performance Issues, Cascade and Debugging, MCP Server Functionality, AI Code Suggestion Behavior, Windsurf Features and Subscriptions
- Windsurf Performance and Functionality Concerns: Users are experiencing significant performance issues and hallucinations in AI responses following recent updates, causing frustration and difficulties in code editing.
- Many users expressed desires to roll back to previous versions and shared concerns about how effectively Windsurf utilizes certain models.
- Cascade's Auto-Saving Behavior: There have been reports of Cascade auto-applying changes without user intervention, leading to confusion about how to manage code edits.
- Recommendations were made to start new conversation histories to mitigate some of the issues with auto-suggestions.
- MCP Server Issues: Several users have encountered errors related to MCP servers, particularly focusing on configurations and compatibility with Windsurf.
- Suggestions included removing incompatible code from these servers to restore functionality.
- AI Behavior in Code Suggestions: Users expressed frustrations with AI like Claude and Cascade taking restrictive approaches to code modifications, often overlooking context or existing logic.
- There was a consensus that the AI's tendency to suggest debugging prints leads to clutter and misunderstandings in code.
- User Engagement with Subscription Models: Discussions arose regarding subscription plans, such as the usefulness of switching between Pro and Team plans based on available features like Turbo Mode.
- Users highlighted their experiences with credit limits, prompting concerns about subscription value compared to the performance of the AI.
- Alin Lupascu: no description found
- Windsurf - Advanced: no description found
- Terminal - Codeium Docs: no description found
- Ask Cascade about using Cascade | Feature Requests | Codeium: There should be a special chat or mode to ask Cascade about Cascade. for example I want to turn off autocomplete for a bit while I'm writing something,
- Open Source Claude Data, 1000+ MCP Servers, Jailbreaking Results | PulseMCP: New this week ending Feb 15, 2025: Open Source Claude Data, 1000+ MCP Servers, Jailbreaking Results
- Monkey Shower GIF - Monkey Shower Monkey Shower - Discover & Share GIFs: Click to view the GIF
- CleanShot 2025-02-16 at 22.59.25: Screenshot uploaded to CleanShot Cloud
- GitHub - AiCodingBattle/mdCrawler: Contribute to AiCodingBattle/mdCrawler development by creating an account on GitHub.
- Quantum Machine Learning For Fun and Profit: no description found
LM Studio ▷ #general (290 messages🔥🔥):
Model Loading Issues, LLM Resources and Recommendations, Hardware Compatibility, DeepSeek Discussions, API and Server Usage
- Issues with Loading Qwen MLX Model: Users reported encountering a
ModuleNotFoundError
when loading the Qwen MLX model on a Macbook Ultra, with suggestions to update tomlx-engine/0.6.0
for fixes.- One user also attempted removing specific directories to solve the issue but confirmed it persists even with the latest version.
- Recommendations for Running LLMs on GPUs: Discussions highlighted the RX580 GPU's limitations for running LLMs, suggesting users opt for newer models with ROCm support for better performance.
- Users were encouraged to test GPU models through cloud services like vast.ai to evaluate performance without upfront hardware investment.
- DeepSeek and Community Projects: The community had lighthearted discussions concerning DeepSeek being labeled a CCP project and potential funding sources amid ongoing technological developments.
- The conversation reflected on the rapid advancements in AI and how they might lead to breakthroughs in the near future.
- API and Local Server Usability: A user expressed frustration over the local server API requiring an OpenAI API key, leading to discussions on quick fixes and workarounds for usability.
- Chat participants suggested simple modifications in code to bypass API key requirements for local setups efficiently.
- Integration of Web Search Features: One participant inquired about the availability of web search extensions for LM Studio, highlighting its potential utility for powering the application.
- The community expressed interest in such features to enhance functionality and explore more integrated uses of LM Studio.
- lmstudio-community/Qwen2-VL-7B-Instruct-GGUF · Hugging Face: no description found
- Welcome | Unsloth Documentation: New to Unsloth?
- Easy_training/Galore+Qlora_With_Multi_GPU_Support at main · rombodawg/Easy_training: Contribute to rombodawg/Easy_training development by creating an account on GitHub.
- Tweet from Quinn Nelson (@SnazzyLabs): Open source software is so cool. This is my Rivian R1S being autonomously driven by the Comma 3X—a dashboard mounted device that hijacks the signal to my car’s autonomy computer and inserts its own (m...
- GitHub · Build and ship software on a single, collaborative platform: Join the world's most widely adopted, AI-powered developer platform where millions of developers, businesses, and the largest open source community build software that advances humanity.
- GitHub - deepseek-ai/DeepSeek-R1: Contribute to deepseek-ai/DeepSeek-R1 development by creating an account on GitHub.
- DeepSeek R1: $5000 vs $1000 Computer | M4 Max 128GB vs M1 Air 16GB | LM Studio Tutorial: 💻 Thinking about upgrading your MacBook for AI? I tested DeepSeek R1’s open-source models on a $5000 M4 Max (128GB) vs a $1000 M1 Air (16GB) – the results w...
- LM Studio - Beta Releases: Beta and Release Candidate versions of LM Studio
- Chat with Documents | LM Studio Docs: How to provide local documents to an LLM as additional context
LM Studio ▷ #hardware-discussion (545 messages🔥🔥🔥):
LM Studio compatibility, GPU performance benchmarks, Quantization techniques, RAM and VRAM sharing, Choosing GPUs for AI workflows
- LM Studio compatibility issues: Users reported issues with LM Studio indicating 'incompatible' hardware, often due to CPUs lacking AVX2 instructions.
- Ensuring hardware meets requirements is essential for models to be searchable within LM Studio.
- Benchmarks and GPU comparisons: Discussion about the performance of GPUs like the RTX 4060 Ti, which is noted for its value in handling models efficiently, especially in terms of tokens per second.
- Comparison showed the RTX 3090 also performs well, particularly when it comes to running larger models.
- Understanding quantization: Quantization was explained in terms of its impact on model performance, with sizes like Q4 and Q8 discussed in relation to memory requirements and inference speeds.
- Users highlighted that different models have varying sensitivities to quantization, affecting their effectiveness.
- RAM and VRAM sharing capabilities: Users inquired about the potential to share system RAM and VRAM for larger model inference, acknowledging that such setups may result in slower speeds.
- Best results are obtained using models specifically indicating full GPU offload capabilities.
- AI workflow considerations with GPUs: The conversation emphasized the importance of selecting the right GPU for AI tasks, mentioning that the 4060 Ti provided the best value based on token processing speed.
- Models like Deepseek R1 and others were recommended for users looking to enhance their coding or reasoning tasks.
- Cooler Master introduces colored ‘AI Thermal Paste’ — CryoFuze 5 comes with nano-diamond technology: Claims its can 'maintain stable performance' from -50C to 240C.
- NVIDIA RTX 2000 Ada Generation Specs: NVIDIA AD107, 2130 MHz, 2816 Cores, 88 TMUs, 48 ROPs, 16384 MB GDDR6, 2000 MHz, 128 bit
- Radeon™ PRO W7800 AI TOP 48G Key Features | Graphics Card - GIGABYTE Global: no description found
- RamCalculator: no description found
- Tweet from Tomasz Gawroński (@GawroskiT): 9070 XT hope this prices i found are placeholder... https://www.mlacom.si/komponente/graficne-kartice/i_3026151_acer-predator-bifrost-radeon-rx-9070-xt-oc-16g-gddr6-graficna-kartica
- GPU AI Comparison: no description found
- Reddit - Dive into anything: no description found
- Now Old GIF - Now Old Man - Discover & Share GIFs: Click to view the GIF
- Mcgonagall Clapping GIF - Mcgonagall Clapping 10Points For Gryffindor - Discover & Share GIFs: Click to view the GIF
- Bykski 24-Pin Power Supply Jumper Connector (B-24PWSTR) | eBay: no description found
- GitHub - Zhen-Dong/Awesome-Quantization-Papers: List of papers related to neural network quantization in recent AI conferences and journals.: List of papers related to neural network quantization in recent AI conferences and journals. - Zhen-Dong/Awesome-Quantization-Papers
- DeepSeek’s networking assembly code issues on SLOW switches #deepseek #openai #nvidia #technews #ai: How did DeepSeek R1 stand out in the AI Market when it's built on old Nvidia chips? The answer is smart load balancing, and assembly code. Alex explains Deep...
- HP ZBook Ultra G1a 14” Mobile Workstation PC – AI Laptop: Unleash groundbreaking performance to take on complex AI workflows with the HP ZBook Ultra G1a. A sleek, thin, battery-efficient AI laptop and mobile workstation.
- HP Z2 Mini G1a Workstation Desktop PC – AI PC : Unlock workflows previously not possible with the Z2 Mini G1a Workstation desktop. Transformative performance packed into a compact AI computer.
- New AMD Graphics Cards… Worth?: no description found
- NVIDIA RTX 2000E Ada Generation | Professional GPU | pny.com: no description found
aider (Paul Gauthier) ▷ #general (635 messages🔥🔥🔥):
Grok 3 Release, DeepSeek R1 vs O3 Mini High, RAG and Repomap Efficiency, Gemini Models Performance, Aider Issues
- Upcoming Grok 3 Release: Grok 3 is expected to be released soon, with some users speculating about its capabilities and performance improvements over previous models.
- There are reports of early access to Grok 3, with features such as a voice mode called 'Ara' being mentioned.
- DeepSeek R1 vs O3 Mini High: Users are comparing DeepSeek R1 and O3 Mini High for their functionality, with some preferring R1 for its reasoning capabilities.
- Feedback indicates that while R1 may perform better in certain scenarios, O3 Mini High is more consistent in others.
- Challenges with RAG and Repomap: Participants discussed the inefficiencies of RAG and Repomap in large repositories, noting the diminishing returns as the codebase size increases.
- Additional strategies such as hybrid scoring between BM25 and RAG were suggested to improve search results.
- Performance of Gemini Models: Several users shared their experiences with Gemini models, specifically noting that Gemini Flash 2.0 provided faster and cost-effective solutions.
- Some found it to have better reasoning and math capabilities compared to Sonnet, sparking comparisons between different AI models.
- Issues with Aider Functionality: Aider users reported recent bugs following updates, particularly with command parsing and
.dockerfile
handling issues.- Discussions also included possible censorship in DeepSeek based on training data and web interface settings, which users found concerning.
- GPT-4 32k - API, Providers, Stats: GPT-4-32k is an extended version of GPT-4, with the same capabilities but quadrupled context length, allowing for processing up to 40 pages of text in a single pass. This is particularly beneficial fo...
- RAG For a Codebase with 10k Repos: Discover how qodo used RAG to bridge LLMs' context gaps, ensuring quality and integrity in generative AI coding.
- Tweet from Nebius AI Studio (@nebiusaistudio): DeepSeek R1 just got faster 👀 Introducing our new high-performance endpoint:- Up to 60+ tokens/second- Advanced reasoning- Starting at $2/$6 per 1M tokensTry it now on Nebius AI Studio✨
- Tweet from Penny2x (@imPenny2x): Holy crap.
- GitHub - richardanaya/luckyshot: A CLI tool for finding the files that count 🤠🔫: A CLI tool for finding the files that count 🤠🔫. Contribute to richardanaya/luckyshot development by creating an account on GitHub.
- Nacho Libre GIF - Nacho Libre Why - Discover & Share GIFs: Click to view the GIF
- Math Teddy GIF - Math Teddy Writing - Discover & Share GIFs: Click to view the GIF
- Cline Memory Bank | Cline: no description found
- X Pum Boom GIF - X Pum Boom Rest My Case - Discover & Share GIFs: Click to view the GIF
- Tweet from Sam Altman (@sama): we put out an update to chatgpt (4o).it is pretty good.it is soon going to get much better, team is cooking.
- How to use multiple, separate open-ai compatible API endpoints at the same time · Issue #3125 · Aider-AI/aider: Issue I have access to multiple openai-compatible endpoints. The each use a different URL and API key. I'd like to use a mix of these, for example use one API endpoint for weak-model, one for arch...
- GitHub - Abraxas-365/langchain-rust: 🦜️🔗LangChain for Rust, the easiest way to write LLM-based programs in Rust: 🦜️🔗LangChain for Rust, the easiest way to write LLM-based programs in Rust - Abraxas-365/langchain-rust
- GitHub - vespa-engine/vespa: AI + Data, online. https://vespa.ai: AI + Data, online. https://vespa.ai. Contribute to vespa-engine/vespa development by creating an account on GitHub.
- GitHub - ai-christianson/RA.Aid: Develop software autonomously.: Develop software autonomously. Contribute to ai-christianson/RA.Aid development by creating an account on GitHub.
- Reddit - Dive into anything: no description found
- Tweet from Sam Altman (@sama): we put out an update to chatgpt (4o).it is pretty good.it is soon going to get much better, team is cooking.
- v0 by Vercel: Chat with v0. Generate UI with simple text prompts. Copy, paste, ship.
- GitHub - HKUDS/LightRAG: "LightRAG: Simple and Fast Retrieval-Augmented Generation": "LightRAG: Simple and Fast Retrieval-Augmented Generation" - HKUDS/LightRAG
- Moa by gembancud · Pull Request #2628 · Aider-AI/aider: Add Mixture of Architects (MOA) FeatureWhy choose between r1, o3, and sonnet, when you can have 'em all!OverviewThis PR introduces a powerful new feature called "Mixture of Architects...
aider (Paul Gauthier) ▷ #questions-and-tips (53 messages🔥):
Aider configuration issues, OpenRouter model performance, Repo maps and context handling, Adding external search engines, Benchmarking with custom files
- Aider struggles with context suggestions: Users are reporting that Aider no longer prompts them to add relevant files, particularly after switching to different models like o3-mini.
- One user mentioned that adding the project directory structure to a read-only file hasn't improved suggestions.
- OpenRouter models can be slow: Several members shared their experiences with OpenRouter models being slow, particularly while using DeepSeek.
- Users discussed adjusting settings to exclude slower providers for improved performance.
- Creating effective repo maps for Aider: Users are experimenting with repo maps to help Aider better understand their project context and suggest relevant files.
- Some users found adding their own structured guides helped improve Aider’s performance in suggesting files.
- Integrating external search tools with Aider: A user inquired about adding a web search capability to Aider, with options like Perplexity being considered.
- The community suggested looking into ra.aid, which integrates with various APIs for task planning and execution alongside Aider.
- Aider's file path handling confusion: A user expressed frustration as Aider creates new files starting from the git working directory rather than respecting the current working directory.
- This behavior is contrary to the user's use of
--subtree-only
for their repo.
- This behavior is contrary to the user's use of
- Models: 'deepseek' | OpenRouter: Browse models on OpenRouter
- Aider LLM Leaderboards: Quantitative benchmarks of LLM code editing skill.
- Options reference: Details about all of aider’s settings.
- André Simon / ansifilter · GitLab: ANSI sequence filter
- GitHub - ai-christianson/RA.Aid: Develop software autonomously.: Develop software autonomously. Contribute to ai-christianson/RA.Aid development by creating an account on GitHub.
aider (Paul Gauthier) ▷ #links (1 messages):
Qwen2.5-Coder-7B, Edit Prediction Model, Zeta Dataset, Fine-Tuning Techniques
- Qwen2.5-Coder-7B fine-tuned for Zed: The repository features a fine-tuned version of Qwen2.5-Coder-7B specifically designed to enhance edit prediction in Zed applications.
- This fine-tuned model enables more efficient coding assistance and improved interaction for users leveraging the Zed platform.
- Training insights shared: The model has been fine-tuned using the zeta dataset, which is pivotal for its performance enhancements.
- For those interested in self-fine-tuning, detailed scripts for both DPO Fine-Tuning View Notebook and SFT Fine-Tuning View Notebook are available.
Link mentioned: zed-industries/zeta · Hugging Face: no description found
OpenAI ▷ #ai-discussions (323 messages🔥🔥):
AI Bias and Safety, Grok 3 Expectations, AI Generated Code Concerns, Virtual Companions and Usage, LLMs and Reasoning Abilities
- Concerns on AI Bias and Safety: Participants expressed skepticism about Elon Musk's intentions behind Grok, fearing potential biases in its design due to his political affiliations and prior actions on Twitter.
- Discussions included the importance of reasonable censorship in AI and whether AI should influence political opinions or narratives.
- Expectations for Grok 3: Speculation regarding Grok 3's capabilities revealed a cautious optimism among users, with some hoping it would perform well against existing AIs like O1.
- Concerns were raised about the possibility of Grok being used as a propaganda tool, contingent on its programming and AI safety measures.
- AI Generated Code Vulnerabilities: Users noted the risks of relying on AI-generated code, emphasizing that such code may contain security issues, such as unsafe string concatenation.
- Discussions highlighted the need for developers to review AI code critically, as poor practices could lead to vulnerabilities in applications.
- Usage of AI as Virtual Companions: Some users shared their experiences using AI as virtual companions, highlighting both the humorous and nuanced aspects of such interactions.
- One user humorously admitted to using ChatGPT as a 'virtual girlfriend,' showcasing the varying degrees to which people integrate AI into their personal lives.
- The Uncertainty of LLMs' Reasoning Abilities: Discussions centered around Yann Lecun's skepticism regarding the reasoning capabilities of large language models (LLMs), with participants agreeing on the need for ongoing exploration.
- The conversation acknowledged that while advancements have been made, it remains uncertain whether LLMs can genuinely understand or reason about complex topics.
- Pollinations.DIY: no description found
- The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding: In a systematic way, we investigate a widely asked question: Do LLMs really understand what they say?, which relates to the more familiar term Stochastic Parrot. To this end, we propose a summative as...
- The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding : no description found
OpenAI ▷ #gpt-4-discussions (22 messages🔥):
Publishing to GPT Store, Removing files from knowledge base, Scholar GPT for research and story crafting, System prompt guide for O-series models, Opinions on Poe
- Publishing issue on GPT Store: A member encountered a message stating, 'Public actions require valid privacy policy URLs' when trying to publish to the GPT Store.
- Another member advised filling in the privacy policy field in actions, which resolved the issue.
- Challenges with Knowledge Base Files: There was a discussion about a lack of an option to remove files from the knowledge base in custom GPTs.
- Eventually, the original poster found the option available, which was possibly linked to a recent bug fix.
- Scholar GPT for Research vs Story Crafting: A user posed questions regarding the effectiveness of Scholar GPT for research compared to crafting story plots.
- This sparked interest with others sharing their experiences, particularly focusing on the tool's capabilities.
- O-Series AI Models System Prompt Changes: A member inquired about the new system prompt guide for O-series AI models, leading to a clarification regarding the shift to developer messages.
- It's noted that this change aligns with the reasoning models' specifications as outlined in OpenAI's documentation.
- Opinions on Poe: A member expressed a critical opinion on Poe, specifically its knowledge base feature being ineffective.
- They noted issues where models would respond in other languages despite queries being in English.
OpenAI ▷ #prompt-engineering (51 messages🔥):
Legislative Writing Prompts, Prompt Learning Resources, Multi-Agent Thinking, MCP Implementation
- Caution in Legislative AI Usage: A member emphasized using AI tools cautiously for creating legislative materials, suggesting that a skilled human should review every word of the output to ensure safety.
- They noted that they often discard about 90% of AI suggestions, using it mainly to improve and strengthen their own ideas.
- Best Sites for Prompt Learning: A member suggested that the best resource for learning about prompts is ChatGPT itself, recommending users ask it to explain specific prompts they've encountered.
- One effective prompt shared was asking ChatGPT to analyze the meaning and clarity of the user's input.
- Challenges with Token Limitations: A discussion centered on the frustrations of limited token output streams in AI interactions, with members sharing experiences about how it affects their projects.
- It was mentioned that using multi-agent setups could help bypass token limitations, especially for generating longer outputs.
- Multi-Agent Thinking and MCP: A member described using a multi-agent system that integrates Google's model and GPT, highlighting the need for an API listening script which can be challenging to maintain.
- MCP (Model Context Protocol) was introduced as a potential solution, with claims that it is gaining traction among clients for its ability to streamline API interactions.
- Chain of Thought Limitations: The limitations of Chain of Thought prompts in AI were discussed, with views that they do not enable models to think like older models such as o1 and o3.
- Members acknowledged that expectations for newer models to completely replicate the thinking style of older ones may require a reality check.
OpenAI ▷ #api-discussions (51 messages🔥):
Legislative Writing Prompts, Prompt Learning Resources, Annual Report Analysis, MCP (Model Context Protocol), Token Limitations in Models
- Legislative Writing Prompts Discussed: A member expressed interest in prompts designed for legislative writing materials, highlighting the importance of a skilled human reviewing any AI-generated content for accuracy.
- Careful consideration is emphasized for using AI in important contexts, ensuring every output is thoroughly checked.
- Best Resources for Prompt Learning: One member suggested that the ChatGPT web interface is an excellent resource for prompt learning, allowing users to understand how AI interprets prompts.
- A popular prompt example shared was about instructing ChatGPT to explain the meanings of provided prompts, enhancing the user's understanding of effective prompting.
- Analyzing Annual Reports with ChatGPT: A member sought suggestions on using ChatGPT to analyze and compare annual reports.
- This topic encouraged discussion about utilizing AI for detailed analysis while ensuring outputs are coherent and relevant.
- Exploring Model Context Protocol (MCP): Discussion arose about using MCP to connect multiple LLMs efficiently, enhancing the capability of projects without running into token limitations.
- One member shared their current setup utilizing Google’s model for logic processing and ChatGPT for scripting, declaring it a smart solution.
- Challenges with Token Limitations: Concerns were raised about the limited output and token stream of GPT models, affecting their efficacy in various tasks.
- Members noted that there are time limits and token constraints, which complicate interactions, while others shared strategies to work around these challenges.
Stability.ai (Stable Diffusion) ▷ #general-chat (369 messages🔥🔥):
Stable Diffusion Versions, LORA Training, Running Local AI, Image Generation Techniques, Using ComfyUI
- Stable Diffusion 1.7.0 Performs Well: Users noted that Stable Diffusion 1.7.0 is more efficient than 1.10, with faster loading times and better adherence to prompts.
- One user mentioned that 1.10 would often generate unsatisfactory images, citing longer load times and erratic results.
- Tips for LORA Training: For training a LORA, it is suggested to use around 50-150 images of your concept for better outcomes.
- Users discussed using tools like Koyha LORA to facilitate the training process based on specific styles.
- Local AI Setup Recommendations: Several users advocated for running AI models locally, emphasizing the control and customization it offers over online services.
- For optimal performance, a NVIDIA GPU with 8GB VRAM and 32GB of RAM is recommended.
- Image Generation Techniques: Users shared methods for creating images including utilizing regional prompting and efficiently arranging words in prompts.
- One user emphasized the importance of clear prompt structuring to yield better image results.
- ComfyUI Usage and Features: Users discussed the intimidation of ComfyUI but acknowledged its flexibility and capabilities in image generation workflows.
- Features like drag-and-drop uploading of images were highlighted as convenient tools for AI generation.
- Lightricks/LTX-Video at main: no description found
- Door Hide GIF - Door Hide Close - Discover & Share GIFs: Click to view the GIF
- The Rise And Fall Of Midjourney: In 2022, MidJourney was unstoppable. It was the go-to AI image generator, leveraging network effects, data-driven learning, and viral marketing to dominate t...
- GitHub - stepfun-ai/Step-Video-T2V: Contribute to stepfun-ai/Step-Video-T2V development by creating an account on GitHub.
- Stable Diffusion Online: no description found
- Trump and Putin Jamming: AI Music Video With World Leaders Epic Jam!: #politics #donaldtrump #kamalaharris #vladimirputin #macron #trudeau #politician #celebrity #ai #viralvideo #trending #trendingvideo #shortvideo #worldleader...
- GitHub - AUTOMATIC1111/stable-diffusion-webui at cf2772fab0af5573da775e7437e6acdca424f26e: Stable Diffusion web UI. Contribute to AUTOMATIC1111/stable-diffusion-webui development by creating an account on GitHub.
- Evil Comes Back Biting #motivation #lifelessons #motivationalspeech #wisdom #mindset: What goes around, comes around in unexpected ways! 🌟 In this inspiring short story, a man faces a tough decision that ultimately shows us the power of doing...
- Releases · AUTOMATIC1111/stable-diffusion-webui: Stable Diffusion web UI. Contribute to AUTOMATIC1111/stable-diffusion-webui development by creating an account on GitHub.
Cursor IDE ▷ #general (344 messages🔥🔥):
Claude model performance, MCP server setups, Cursor updates, Grok 3 rollout, Model comparison
- Claude is showing signs of decreased performance: Users have noted that Claude's responses have become slower and less context-aware in the past days, with some labeling it as 'nerfed'. Several members expressed frustration with these changes and discussed alternative models.
- It has been suggested that Claude may be experiencing internal changes that affect its performance without any visible model update.
- Discussion around MCP server setups: Users shared various setups for MCP servers, notably the use of Brave Search and Sequential Thinking to enhance performance. It was noted that these setups can significantly improve the reliability and documentation of the coding process.
- A user was also seeking a simple 'hello world' example for MCP, indicating a desire for more accessible resources.
- Grok 3 rolling out amidst curiosity: The release of Grok 3 sparked curiosity among users, with some wondering about its performance compared to existing models. Speculation arose regarding the potential updates made on the backend that might align it closer to newer versions of Llama.
- Users expressed their willingness to test Grok 3, particularly to validate its abilities after various underwhelming launch experiences in the past.
- Cursor's handling of user requests: Concerns were raised about Cursor's ability to handle extensive tasks and process larger files effectively, especially in agent mode. Users discussed their frustrations with the limitations imposed on the platform despite having unlimited tokens available.
- There were suggestions that Cursor users might need to explore other models and features due to these limitations.
- Emerging services and exploitation concerns: Concerns were raised about potential exploits allowing free trial abuse of Cursor services, leading to discussions about security and oversight. Users referenced automated scripts that might exploit the system rollouts and discussed how to handle such issues.
- The community expressed frustration regarding updates and service reliability, urging the Cursor team to remain vigilant against misuse.
- Cursor – Models: no description found
- PearAI - The Open Source AI Code Editor: PearAI is an open source AI-powered code editor with powerful features like AI chat, PearAI Creator, and AI debugging to help you make what excites.
- Tweet from Tibor Blaho (@btibor91): Try Anthropic's new Thinking modelTurn on **Thinking** in model settings to see Claude's thought processGet step-by-step reasoning for complex tasks like coding and math. Perfect for understan...
- Tweet from Lukas Troup (@lukas_troup): I tested multiple AI tools to convert a simple Figma UI design into code to see which one delivers the best result.The results you see in the screenshots are generated from the first prompt (with at m...
- Time GIF - Aint Nobody Got Time For That Kimberly Wilkins Interview - Discover & Share GIFs: Click to view the GIF
- GitHub - daniel-lxs/mcp-perplexity: MCP Server for the Perplexity API.: MCP Server for the Perplexity API. Contribute to daniel-lxs/mcp-perplexity development by creating an account on GitHub.
- Tweet from Lovable (@lovable_dev): Introducing Visual EditsYou can now visually edit any styles in Lovable, for faster and more precise edits
- Tweet from BigQiang (@iBigQiang): 在 B 站看到个使用Cloudflare无限续杯Cursor教程和配套脚本实在太牛了,条件是拥有域名并且需要交给cloudflare托管,不过域名也就几块钱而已,相比每个月20刀的Cursor还是很香了。Cursor能写代码、写文章,大部分ai能干的事他都可以,可白嫖Claude 3.5 sonnet,deepseek v3&r1,gpt4o和gpt4o mini等
- O3-mini is LIVE! What version are we getting?: Confirming that o3-mini has been updated to use high reasoning level in Cursor. Please let us know what you think 🙂
- Tweet from Chubby♨️ (@kimmonismus): Grok-3 is starting to roll out. Check your Grok selectionQuoting Penny2x (@imPenny2x) Holy crap.
- GitHub - bgstaal/multipleWindow3dScene: A quick example of how one can "synchronize" a 3d scene across multiple windows using three.js and localStorage: A quick example of how one can "synchronize" a 3d scene across multiple windows using three.js and localStorage - bgstaal/multipleWindow3dScene
- Tavily MCP: Attempting to Replicate Deep Research (Setup & Demo): Tavily just released their official MCP server, bringing powerful search capabilities to Claude. At the same time, OpenAI's Deep Research has been making wav...
- Reddit - Dive into anything: no description found
- Cursor + MCP Servers: Complete Setup Guide (Sequential Thinking, Brave Search, & More): Cursor just added MCP support! In this complete setup guide, I'll show you how to integrate and use MCP servers (Sequential Thinking, Brave Search, and Puppe...
- GitHub - chengazhen/cursor-auto-free: auto sign cursor: auto sign cursor. Contribute to chengazhen/cursor-auto-free development by creating an account on GitHub.
OpenRouter (Alex Atallah) ▷ #announcements (2 messages):
OpenRouter real-time charts, llms.txt documentation
- OpenRouter Enhances Real-Time Analysis: OpenRouter's throughput and latency charts now update in real time, showcasing a significant improvement attributed to Google AI's Vertex enhancements. You can see the changes detailed in their announcement here.
- Kudos were given for the recent speedup, which has improved user experience considerably.
- Introducing llms.txt for Documentation Access: For those building on OpenRouter, the newly published llms.txt serves as a comprehensive resource to chat with the docs. You can access it here or download the full version here.
- The publication has been celebrated for its extensive content, inviting users to engage with the documentation creatively, as noted in their tweet.
- Tweet from OpenRouter (@OpenRouterAI): TIP: throughput and latency charts on OpenRouter update in real timeHere are Sonnet's. Kudos to @GoogleAI Vertex for the recent speedup!
- Tweet from OpenRouter (@OpenRouterAI): Just published a big long beautiful llms.txt containing all of our docs.You know what to do! ✨
OpenRouter (Alex Atallah) ▷ #app-showcase (5 messages):
Versatile AI Tooltip Extension, Toledo1 AI Browser, Online Party Game, Roo-Code URL Update, Favicon Issues
- Launch of Versatile AI Tooltip Extension: A new Chrome extension, Versatile AI Tooltip, lets users quickly process text and customize instructions using OpenRouter models with a simple setup involving their API key.
- It aims at summarizing articles and translating text snippets without any cost, as shared by its developer.
- Toledo1 Offers Pay-Per-Question AI Access: The Toledo1 platform allows private chatting with AI assistants, utilizing a pay-per-question model and the ability to combine multiple AIs for accurate answers.
- It features client-side search with easy installation, boasting enterprise-grade security and avoiding subscription fees.
- Exciting Online Party Game Released: An online party game has been launched where players can fight against friends or bots using any action they can think of, accessible for free without sign-up.
- Players can try the game at Wits and Wands and engage in unique battles.
- Request to Update Roo-Code's URL: A suggestion was made to update the app showcase with Roo-Code's new URL, which would also address the correct favicon display issue.
- The new URL provided is Roo-Code Documentation, and an image link for the favicon was shared but highlighted as automatic on their end.
- Automatic Favicon Issue Raised: A user pointed out that the app showcase favicon issue is tied to Roo-Code not updating their headers, making it an automatic concern on the server’s side.
- This illustrates the communication gap between app maintainers and URL content providers impacting display elements.
- Wits and Wands - AI-Powered Magical Dueling Game: An AI-powered magical dueling game where you cast spells, outwit your friends, and win!
- Toledo1: no description found
- Chrome Web Store: Add new features to your browser and personalize your browsing experience.
OpenRouter (Alex Atallah) ▷ #general (315 messages🔥🔥):
DeepSeek R1, OpenRouter model performance, Rate limits and API usage, Multimodal models, Reasoning tokens in responses
- DeepSeek R1's Variable Reasoning Inclusion: Users reported inconsistencies with the DeepSeek R1 free model where reasoning tokens sometimes appeared in the response content rather than as separate reasoning fields, indicating potential issues with model behavior.
- The OpenRouter team is aware of the problem and is tracking solutions to better handle the behavior of open-source reasoning models.
- Rate Limits and API Management: Discussion emerged surrounding rate limits for paid models used during a workshop, with assurances from OpenRouter that generally, users won't hit rate limits unless the demand surges significantly.
- OpenRouter manages rate limits globally and suggests using multiple models to share the load if any issues arise.
- Concerns Regarding Data Privacy: A user expressed concerns about forced routing to specific countries which could violate data protection laws, emphasizing the need for region-based routing options for compliance.
- OpenRouter acknowledged the issue and is exploring options to support EU-specific routing for better legal compliance.
- Multimodal Model Evaluation: Users shared feedback on the performance of multiple models including Sonnet 3.5, Flash 2.0, and o3-mini-high for coding tasks, highlighting o3-mini-high's unusual behavior despite achieving zero-shot capability on complex problems.
- The effectiveness of models in tasks like chess was also discussed, with a consensus that current LLMs struggle with such scenarios.
- Benchmarking and Model Testing: Inquiries were made about available benchmarking scripts for testing various models, especially in relation to performance consistency.
- Users report experiencing 'premature close' errors with Claude 3.5 Sonnet, prompting discussions around handling invalid response bodies from API requests.
- Chatroom | OpenRouter: LLM Chatroom is a multimodel chat interface. Add models and start chatting! Chatroom stores data locally in your browser.
- inference.net | OpenRouter: Browse models provided by inference.net
- API Rate Limits - Manage Model Usage and Quotas: Learn about OpenRouter's API rate limits, credit-based quotas, and DDoS protection. Configure and monitor your model usage limits effectively.
- Structured Outputs - Type-Safe JSON Responses from AI Models: Enforce JSON Schema validation on AI model responses. Get consistent, type-safe outputs and avoid parsing errors with OpenRouter's structured output feature.
- ChatGPT-4o - API, Providers, Stats: OpenAI ChatGPT 4o is continually updated by OpenAI to point to the current version of GPT-4o used by ChatGPT. It therefore differs slightly from the API version of [GPT-4o](/models/openai/gpt-4o) in t...
- gemini flash 2.0 vs o3 mini: no description found
- BYOK - Bring Your Own Keys to OpenRouter: Learn how to use your existing AI provider keys with OpenRouter. Integrate your own API keys while leveraging OpenRouter's unified interface and features.
- Tweet from DeepSeek (@deepseek_ai): 🎉 Excited to see everyone’s enthusiasm for deploying DeepSeek-R1! Here are our recommended settings for the best experience:• No system prompt• Temperature: 0.6• Official prompts for search & file up...
- Tweet from Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) (@teortaxesTex): It's stunning to be sure. But you have to admit, the way diffusion video models deal with high-frequency in-plane movements (eg look at the dog's hind legs here) still gives the lie to Sama...
- stepfun-ai/Step-Audio-Chat · Hugging Face: no description found
- Google Search: no description found
- Chrome Web Store: Add new features to your browser and personalize your browsing experience.
- Jonah Hill No GIF - Jonah Hill No Chill Out Bro - Discover & Share GIFs: Click to view the GIF
- Chatbot: Learn how to use the useChat hook.
- GitHub - n4ze3m/page-assist: Use your locally running AI models to assist you in your web browsing: Use your locally running AI models to assist you in your web browsing - n4ze3m/page-assist
- GitHub - deepseek-ai/DeepSeek-R1: Contribute to deepseek-ai/DeepSeek-R1 development by creating an account on GitHub.
- Integration Frameworks - OpenRouter SDK Support: Integrate OpenRouter using popular frameworks and SDKs. Complete guides for OpenAI SDK, LangChain, PydanticAI, and Vercel AI SDK integration.
- Provider Routing - Smart Multi-Provider Request Management: Route AI model requests across multiple providers intelligently. Learn how to optimize for cost, performance, and reliability with OpenRouter's provider routing.
- GitHub - sigoden/aichat: All-in-one LLM CLI tool featuring Shell Assistant, Chat-REPL, RAG, AI Tools & Agents, with access to OpenAI, Claude, Gemini, Ollama, Groq, and more.: All-in-one LLM CLI tool featuring Shell Assistant, Chat-REPL, RAG, AI Tools & Agents, with access to OpenAI, Claude, Gemini, Ollama, Groq, and more. - sigoden/aichat
- GitHub - BuilderIO/ai-shell: A CLI that converts natural language to shell commands.: A CLI that converts natural language to shell commands. - BuilderIO/ai-shell
- @plastichub/kbot: AI-powered command-line tool for code modifications and project management that supports multiple AI models and routers.. Latest version: 1.1.14, last published: 3 days ago. Start using @plastichub/kb...
- 跃问: no description found
Interconnects (Nathan Lambert) ▷ #news (152 messages🔥🔥):
Grok 3 Release, Mistral Saba Model, Emerging Technologies in Robotics, Tencent's New LLMs, Character Training in AI
- Excitement Grows for Grok 3 Release: Elon Musk announced the release of Grok 3 with a live demo scheduled, claiming it to be the 'smartest AI on Earth', sparking mixed reactions about its alignment with user needs.
- Concerns about whether the model will be effectively trained or overly tweaked for cultural biases were raised, indicating skepticism about its commercialization.
- Mistral Saba's Regional Focus: Mistral Saba, a 24B parameter model, is designed for the Middle East and South Asia, emphasizing custom-trained capabilities to improve language representation, nuance, and cultural context.
- This move reflects a growing trend to tailor AI models for specific regional markets rather than relying solely on general-purpose models.
- Tencent Expands AI Offerings: Tencent announced plans for their Hunyuan LLM Turbo-S and a video generation model, expected in Q1 2025, demonstrating their growing capabilities in AI.
- The announcement highlights Tencent's ambition to solidify its position in the competitive AI landscape, especially in video technologies.
- Character Training in AI Models: Discussion about the impact of recent updates to ChatGPT led to insights on character training, suggesting simplicity in concept but complexity in execution.
- Users noted the increasing personalization within models, raising questions about the implications for trust and reliability in AI interactions.
- Challenges in Video Generation: Concerns over video generation technologies plateaued due to the challenges of moderating generated content and preventing harmful outputs.
- Participants debated the potential for advanced tools to detect and counteract misuse of video generation, emphasizing the need for robust protective measures.
- Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model: We present Step-Video-T2V, a state-of-the-art text-to-video pre-trained model with 30B parameters and the ability to generate videos up to 204 frames in length. A deep compression Variational Autoenco...
- Tweet from Anissa Gardizy (@anissagardizy8): new: Elon Musk’s artificial intelligence startup xAI is looking to set up a new data center as it plans to substantially increase the number of Nvidia chips it uses, according to someone involved in t...
- Tweet from Bloomberg Technology (@technology): The humanoid robotics startup Figure AI is said to be in talks to raise $1.5 billion at a valuation of nearly $40 billion. https://trib.al/Fpe172t
- Mistral Saba | Mistral AI: One of the many custom-trained models to serve specific geographies, markets, and customers
- New Podcast Spotlight: The Interconnect: Podcast Episode · The World Next Week · 02/14/2025 · 30m
- Tweet from 青龍聖者 (@bdsqlsz): Tencent Hunyuan team announces the public of their LLM Turbo-S and HunyuanVideo I2V in Q1 2025
- Tweet from Mark Gurman (@markgurman): JUST IN: Apple’s long-promised AI overhaul for the Siri digital assistant is facing last-minute engineering and software bugs, threatening to postpone or limit its release, sources tell me. https://ww...
- Tweet from Elon Musk (@elonmusk): Will be honing product with the team all weekend, so offline until then
- Tweet from Andrej Karpathy (@karpathy): Actually I quite like the new ChatGPT 4o personality, whatever they did.- it's a lot more chill / conversational, feels a bit more like talking to a friend and a lot less like to your HR partner- ...
- Tweet from TestingCatalog News 🗞 (@testingcatalog): Tencent is testing DeepSeek search inside the Weixin app 👀Weixin is a Chinese version of WeChat and has 1.38B users 🤯
- Tweet from Nathan Lambert (@natolambert): I’m reaching out to technical leaders (mostly managers and senior personnel) at frontier labs to understand how they manage personnel, training complexity, and velocity of sharing new models. Purely t...
- Tweet from ʟᴇɢɪᴛ (@legit_rumors): Grok 3 - if reasoning / thinking is enabled by default, then I can see why xAI is super hyped about what they've madeI'd also start to believe the "smartest AI on Earth" claim by Elon ...
- Tweet from Elon Musk (@elonmusk): Grok 3 is so based 😂
- Tweet from Elon Musk (@elonmusk): Grok 3 release with live demo on Monday night at 8pm PT. Smartest AI on Earth.
Interconnects (Nathan Lambert) ▷ #ml-questions (2 messages):
Reinforcement Learning Approaches, Soft Fine-Tuning Initialization
- Reinforcement Learning Reinforces Successful Approaches: In reinforcement learning, multiple approaches are employed to tackle problems, with successful methods being reinforced over time.
- The initialization process remains flexible, allowing adaptation based on the success of various strategies.
- Understanding Soft Fine-Tuning Initialization: The discussion indicated that Soft Fine-Tuning (SFT) acts as an initialization method in the learning process.
- This approach was highlighted as a foundational step in optimizing model performance.
Interconnects (Nathan Lambert) ▷ #ml-drama (11 messages🔥):
Elon Musk's bid for OpenAI, OpenAI board statements, Kylie Robison's communications, Grok 3 release
- OpenAI Board Rejects Musk's Bid: The OpenAI board has formally rejected Elon Musk's latest $97.4 billion bid to control the company, stating, “OpenAI is not for sale.”
- Chairman Bret Taylor emphasized that the board unanimously turned down Musk’s attempts to disrupt competition, as reported by Bloomberg.
- Musk's Comms and Media Coverage: Kylie Robison shared that her inbox is filled with communications urging her to cover ongoing Twitter fights and lawsuits involving Musk, stating “one of us has to pick a struggle!”
- She acknowledged that while the communications are kind, the “cat fights” often lead to orchestrated headlines, which she only covers once legal involvement occurs.
- Grok 3's Anticipated Release: Elon Musk announced that the Grok 3 release will feature a live demo on Monday night at 8 PM PT, promoting it as the “smartest AI on Earth.”
- Musk’s announcement sparked discussions, including a reminder that “a man died to tell us how good Grok 3 really is,” referring to the high stakes involved.
- Tweet from Kylie Robison (@kyliebytes): to be clear his comms are kind and trying to keep me informed but his cat fights are also orchestrated headlines - i dont cover them until lawyers get involved!
- Tweet from Shirin Ghaffary (@shiringhaffary): NEW: OpenAI board formally rejects Elon Musk's investor bid to purchase the nonprofit“OpenAI is not for sale, and the board has unanimously rejected Mr. Musk’s latest attempt to disrupt his compet...
- Tweet from Kylie Robison (@kyliebytes): mind you these are the emails i get from his companyQuoting Kylie Robison (@kyliebytes) @sama your comms literally calls us to cover the twitter fights and lawsuits - one of us has to pick a struggle!...
- Tweet from Aidan McLaughlin (@aidan_mclau): a man died to tell us how good grok 3 really isnever forget
- Tweet from Junyang Lin (@JustinLin610): Oh really?Quoting Elon Musk (@elonmusk) Grok 3 release with live demo on Monday night at 8pm PT. Smartest AI on Earth.
Interconnects (Nathan Lambert) ▷ #random (58 messages🔥🔥):
Writing Advice and Challenges, David Perrell's Writing Insights, Transitioning to Tech Lead Roles, Emergence of World Models in LLMs, Recent Developments in AI Models
- Navigating the Struggles of Writing: Members discussed the challenges of writing, particularly for non-native speakers, and how finding one's voice can be difficult.
- It was suggested to draw inspiration from personal experiences to enhance writing quality.
- David Perrell’s Writing Framework: David Perrell's writing approach emphasizes having a strong title, engaging first sentence, and clear direction in the first paragraph.
- While viewed as opinionated, his content can provide a fun framework for aspiring writers.
- Advice for Aspiring Tech Leads: Members shared insights on transitioning to tech lead roles, emphasizing the importance of experience and interpersonal skills.
- Accepting the shift from an individual contributor to a leadership role is vital, along with advocating for team members.
- Melanie Mitchell and World Models in AI: The discourse included insights from Melanie Mitchell's article on LLMs and their ability to develop 'world models' through game analysis.
- Members noted the Santa Fe Institute's innovative but abstract concepts while appreciating Mitchell's informed takes.
- Feedback on AI Model Versions: User experiences with the new 4o model indicate a preference over previous versions for casual conversations and everyday tasks.
- There are still comparisons being made regarding its coding capabilities against alternatives like Claude.
- ibm-granite/GneissWeb.bloom · Hugging Face: no description found
- Scarlet Ink | Dave Anderson | Substack: Tech industry career and leadership advice from an ex-Amazon GM and Tech Director. Click to read Scarlet Ink, by Dave Anderson, a Substack publication with tens of thousands of subscribers.
- LLMs and World Models, Part 2: Evidence For (and Against) Emergent World Models in LLMs
- Tweet from Xeophon (@TheXeophon): 🤔IBM has repos, blogs and tools for GneissWeb, likely their pre-training dataset. It got deleted, but is still indexed
- Tweet from OedoSoldier (@OedoSoldier): @teortaxesTex
- Tweet from Rosanne Liu (@savvyRL): Tülu 3 is being presented at DLCT right now! Led by @pdasigi The @allen_ai color with 🩷 as author superscript are basically shouting "happy valentine's"
- Tweet from Andrew White 🐦⬛ (@andrewwhite01): Here's a plot of recent reasoning and non-reasoning models' evals on math problems (AIME 2024/Math 500). You can see reasoning models from a variety of groups are showing massive algorithmic g...
- Tweet from ARC Prize (@arcprize): Introducing SnakeBench, an experimental benchmark side questWe made 50 LLMs battle each other in head-to-head snake 🐍2.8K matches showed which models are the best at snake real-time strategy and spat...
Interconnects (Nathan Lambert) ▷ #memes (21 messages🔥):
OpenAI's O1 Pro Mode usage, Deep Research Behavior, AI Cookie Collection Fact, Strawberry Question Controversy, ChatGPT's Elaborate Descriptions
- OpenAI's O1 Pro Mode usage raises eyebrows: Members questioned how often people ask O1 Pro mode to count 'R's in the word strawberry, hinting at humorous interactions with AI models.
- This sparked discussions about the absurdity of such requests and their implications on user behavior.
- Deep Research questioned among peers: A member humorously reflected on being an outlier for reading multiple deepseek papers, suggesting that the group should hire someone to read papers.
- This prompted a chuckle-worthy debate about the seriousness of deep research within the community.
- Charm of Cookie Collection by AI: An Operator reportedly worked for 3 hours collecting cookies, inciting curiosity about the operational costs for OpenAI.
- Members speculated about the financial implications of such tasks, with one humorously guessing $200.01.
- Strawberry Question Stirs Controversy: Dissent arose when a comment suggested the preview model cannot answer simple questions, causing members to question the reliability of AI responses.
- This engaged others in a lighthearted debate about whether to sell their IBM stocks amid the controversy.
- ChatGPT's Elaborate Descriptions Critiqued: ChatGPT's interpretation of simple tasks like scrolling down was jokingly critiqued for being overly elaborate, describing the process as deep research.
- Another user added a humorous take, suggesting that it might just be repeating the same information.
- Tweet from atlas (@creatine_cycle): congrats on the sex lex
- Tweet from Justine Moore (@venturetwins): Wished all the LLMs a happy Valentine's Day and I got friendzoned so hard by Claude 🫠
- Tweet from FleetingBits (@fleetingbits): Operator working for 3 hours collecting cookies for me...
- Tweet from Xeophon (@TheXeophon): bro
- Tweet from gabe (@allgarbled): Sam Altman charged me $200 for thisQuoting OpenAI Developers (@OpenAIDevs) Use o-series models to process unstructured data, find a needle in a haystack, improve code, or handle other complex tasks. F...
Interconnects (Nathan Lambert) ▷ #rl (17 messages🔥):
GRPO, SFT, Training Metrics, Markdown Documentation, CoT vs GRPO Comparison
- GRPO and SFT are Complementary: A member noted that GRPO isn’t meant to replace SFT; rather, both should be utilized together for optimal results.
- This indicates a collaborative approach to model training is favored among users.
- Preference for Notion Over X: One member expressed frustration with using X, stating they preferred Notion for findings as it doesn’t require logging into a less favored website.
- Others shared mixed feelings about the usability of Notion, with comments on its slowness and difficulty in navigating.
- GRPO vs SFT: Determining Factors: A question arose on how to differentiate if a model has been trained via SFT against one using GRPO, highlighting the need for evaluation metrics.
- It was suggested that generalization and the length of the chain-of-thought (CoT) could serve as potential proxies for understanding the training method.
- Discussion on Training Metrics: A member added that training metrics such as entropy should differentiate between in
tags or not, implying a deeper nuance in data evaluation. - This reflects growing interest in how various training paradigms impact model performance metrics.
- Ideas on Document Format: There was a consensus that a markdown file should be considered the minimum requirement for sharing findings, promoting easier accessibility.
- The discussion underlined frustrations with existing platforms and a desire for more straightforward documentation methods.
Link mentioned: Tweet from Trelis Research (@TrelisResearch): + GRPO is Poor and for the GPU-Rich +-------------------------------A specific GRPO vs SFT video will be out next week, but I'm putting initial results hereI trained Llama 3.2 1B on GSM8K with:1...
Interconnects (Nathan Lambert) ▷ #reads (13 messages🔥):
TUM AI Lecture Series featuring Robin Rombach, Elon Musk's wartime mentality, LLaDA: A New Diffusion Model, Post-training Experiences with ChatGPT, Nathan Lambert's Validation
- TUM AI Lecture on FLUX by Robin Rombach: The TUM AI Lecture Series features Robin Rombach discussing 'FLUX: Flow Matching for Content Creation at Scale' on February 17th at 6pm GMT+1. Live streams will be available on YouTube.
- The lecturer will cover fundamental aspects of flow matching and its application in large-scale text-to-image pretraining.
- Understanding Elon Musk's Work Ethos: One perspective on Elon Musk suggests he operates in 'wartime mode', emphasizing rapid movement and high output regardless of collateral damage. This model applies across various ventures, from SpaceX to DOGE, making his mindset more comprehensible.
- The discussion appears to resonate within a thread linked on Twitter, inviting further exploration of related readings.
- Introduction of LLaDA: A New Language Diffusion Model: The paper introduces LLaDA, a diffusion model that challenges traditional autoregressive models (ARMs) by showcasing its capabilities in scalability and instruction-following abilities. It is proposed as a strong competitor to models like LLaMA3, achieved through extensive training on massive data sets.
- Further conversation suggests skepticism regarding the diffusion classification of LLaDA as traditional diffusion characteristics seem absent.
- Post-training Insights on ChatGPT: A recent talk at Stanford by team members highlighted experiences in post-training methods with ChatGPT, although unfortunately, it wasn't recorded. The slides from the talk can be found here.
- Members expressed interest in viewing the recorded presentation if available.
- Nathan Lambert's Emotional Output: Nathan Lambert expressed a feeling of validation amidst a struggle with illness, humorously referencing his recent Twitter victory. He remarked on being in a 'pain cave' as he navigates his current condition.
- His expressed emotion has sparked light-hearted reactions within the conversation.
- Large Language Diffusion Models: Autoregressive models (ARMs) are widely regarded as the cornerstone of large language models (LLMs). We challenge this notion by introducing LLaDA, a diffusion model trained from scratch under the pre...
- Tweet from John Schulman (@johnschulman2): @barret_zoph and I recently gave a talk at Stanford on post-training and our experience working together on ChatGPT. Unfortunately the talk wasn't recorded, but here are the slides: https://docs.g...
- Tweet from Matthias Niessner (@MattNiessner): Tomorrow in our TUM AI - Lecture Series with none other than Robin Rombach (@robrombach), CEO @bfl_ml.He'll talk about "𝐅𝐋𝐔𝐗: Flow Matching for Content Creation at Scale".Live stream: ...
- Tweet from Jason Crawford (@jasoncrawford): One model for Elon is that he is perpetually operating in wartime modeMove as fast as possible, bring overwhelming force, extract superhuman effort, accept collateral damageFits a range of things from...
- TUM AI Lecture Series - FLUX: Flow Matching for Content Creation at Scale (Robin Rombach): Abstract:I will talk about the foundations of flow matching, scaling them for large-scale text-to-image pretraining, preference-tuning approaches and techniq...
- Tweet from theseriousadult (@gallabytes): in what sense is this diffusion? I see no SDE, no probability flow, no noise. not every iterative sampling method is diffusion!this paper is genuinely impressive but it's a new thing I don't s...
Interconnects (Nathan Lambert) ▷ #policy (1 messages):
National Energy Dominance Council, Energy Production Expansion, Natural Resource Management, Economic Growth Strategies, AI Leadership and Manufacturing
- National Energy Dominance Council established: President established the National Energy Dominance Council to enhance American energy leadership by leveraging the nation's rich natural resources.
- The council aims to promote reliable and affordable energy production, boosting economic growth and national security.
- Plan to expand energy production: The President emphasized the need to expand all forms of energy production to address economic challenges such as inflation.
- This expansion is intended to create good-paying jobs and reestablish American leadership across various sectors.
- Utilizing America's natural assets: The policy recognizes the importance of utilizing America's diverse natural resources, including crude oil, natural gas, and critical minerals.
- This strategic focus aims to enhance energy independence while contributing to the country's overall economic prowess.
- Commitment to AI and manufacturing leadership: The President reiterated a commitment to lead in artificial intelligence and manufacturing, key drivers of economic stability.
- This includes using energy resources to bolster advanced manufacturing and technological innovation domestically.
- Diplomacy through energy resources: The strategy outlines the use of commercial and diplomatic levers in energy to achieve global peace.
- By leveraging energy resources, the plan aims to contribute to conflict resolution and strengthen international relations.
Link mentioned: Establishing the National Energy Dominance Council: By the authority vested in me as President by the Constitution and the laws of the United States of America, it is hereby ordered: Section 1.
Interconnects (Nathan Lambert) ▷ #expensive-queries (6 messages):
Deep Research Workflow, O1 Pro Prompting, Scraping Techniques
- Deep Research Workflow Simplifies Efforts: A member shared that using a structured workflow for Deep Research significantly lowers their overall effort and makes it more accessible for initial ideas and prompting.
- ‘This actually lowers my overall effort’ reflects their enthusiasm for the new system approach.
- Transforming Naive Prompts into Comprehensive Outputs: TheXeophon demonstrated how a refined prompting system expanded a basic scraping inquiry from two methods to a detailed approach including a third-party tool and tips for avoidance of bans.
- They emphasized that this results in ‘more in-depth and better’ outputs compared to naive attempts.
- Using O1 Pro for Enhanced Prompts: Members are leveraging O1 Pro to build detailed prompts that generate thorough reports in a streamlined manner, producing a range of effective outputs.
- The findings indicate a significant improvement in quality when utilizing structured prompts tailored for specific objectives.
- Realizing the Value of Project Organization: A member shared their experience with organizing a project using the buccocapital prompting system, finding it effective in generating actionable ideas from minimal input.
- This highlights the benefits of project structure and methodical guidelines in enhancing creative outputs.
- Refining Older Prompts for Better Results: TheXeophon plans to re-run old prompts using the improved techniques learned from the new workflow, indicating a proactive approach to enhancing prompt engineering.
- This reflects a growing recognition within the community of the need for continuous refinement in their research methodology.
-
Tweet from Xeophon (@TheXeophon): Task: Find ways to scrape
Naive prompt: Two methods (requests, playwright) which will workBig prompt: Three methods (requests, playwright, 3rd party tool) + how to circumvent common detect... - Tweet from Xeophon (@TheXeophon): this is a game changer, my naive ODR prompt (top) uses less sources, is less in-depth and worse than using the workflow in the QT (result is bottom)will re-run some of my old prompts again...Quoting B...
- Tweet from BuccoCapital Bloke (@buccocapital): People were curious, so here's how I'm using Deep Research. I'll walk through the prompting and then an example:1. First, I used O1 Pro to build me a prompt for Deep Research to do Deep Re...
Eleuther ▷ #general (20 messages🔥):
AI-generated Images using Data URIs, Math, Chess, and Child Prodigies, Markdown and HTML Conversion Issues, Music Generation and Cognitive Tasks, lm_eval Code Generation Tasks
- AI explores images with Data URIs: A member inquired about training AI for image generation using Data URIs or Base64 encoding, which allows images to be stored without external linking.
- They noted that this method typically shows up as a blob in user interfaces.
- Prodigies in Math and Chess spark intrigue: A conversation arose regarding the commonality of child prodigies in areas like Math and Chess, suggesting these fields might be promising for AI advancements.
- Another member speculated that music might also fit this mold, questioning why it isn't classified as 'easy'.
- Troubleshooting Markdown & HTML conversion: Concerns were raised about using an HTML to Markdown converter that retains escaped characters like
>
, surprising one user.- Another member responded that maintaining consistent encodings with markdown and ASCII is important, as XML formatting rises in popularity.
- Cognitive effort in music generation: Members discussed the cognitive aspects of music creation, noting that generating sound involves physical output managed by the brain.
- The conversation highlighted that current music generators operate in the frequency domain, impacting their ability to optimize like in games such as chess.
- lm_eval lacks code generation tasks: A user reported an issue with lm_eval v0.4.7, finding no code generation tasks listed after installation despite following the provided set-up instructions.
- They shared the installation method, seeking insights on the potential cause of this discrepancy.
Link mentioned: GitHub - EleutherAI/lm-evaluation-harness: A framework for few-shot evaluation of language models.: A framework for few-shot evaluation of language models. - EleutherAI/lm-evaluation-harness
Eleuther ▷ #research (235 messages🔥🔥):
Research on sampling techniques, Applications of transformers in time series, Use of repetition in creative writing contexts, NovelAI sampler understanding, Analysis of DEQs vs RNNs
- NovelAI's Unique Sampler Knowledge: NovelAI's understanding of sampling methods has evolved due to their decision to provide a wide variety of sampler algorithms, pushing developers to figure out the intricacies involved.
- This contrasts with other organizations that initially did not offer complex samplers and thus lacked a reason to invest significant efforts into understanding them.
- Challenges with Repetition in Text Generation: The repetition penalty in text generation is a statistical method to mitigate common issues encountered with less skilled authors, who tend to overuse repetition in their writing.
- While skilled authors can successfully utilize repetition for dramatic effect, distinguishing good repetition from bad remains a challenge in developing sampler policies.
- Theory Behind Sampler Quality using Logprobs: The approach described involves utilizing a cubic polynomial of the logprobs to approximate token quality, with some transformation applied to convert the quality back to logprobs.
- The method adjusts logprobs such that middle values are increased while high and low values are penalized, although some distortion due to entropy considerations exists.
- Comparing DEQs to RNNs: Discussion around DEQs highlights their relationship with recurrent structures, focusing on the implications of weight tying and the potential for hidden layers to converge to fixed points.
- Participants noted that, unlike RNNs which handle time variation, DEQs emphasize depth and can utilize implicit differentiation methods for backpropagation.
- Exploration of Transformer's Suitability for Time Series: There is ongoing debate about the effectiveness of transformers in time series contexts, with some criticizing their inability to handle specific structured problems like parity efficiently.
- Research indicates that while transformers can be applied to time series data, traditional models like cubic splines often outperform them in capturing underlying patterns.
- On the Role of Bidirectionality in Language Model Pre-Training: Mikel Artetxe, Jingfei Du, Naman Goyal, Luke Zettlemoyer, Veselin Stoyanov. Findings of the Association for Computational Linguistics: EMNLP 2022. 2022.
- TransMLA: Multi-Head Latent Attention Is All You Need: Modern large language models (LLMs) often encounter communication bottlenecks on current hardware, rather than purely computational constraints. Multi-head Latent Attention (MLA) tackles this challeng...
- DeepCrossAttention: Supercharging Transformer Residual Connections: Transformer networks have achieved remarkable success across diverse domains, leveraging a variety of architectural innovations, including residual connections. However, traditional residual connectio...
- Regularization by Denoising: Clarifications and New Interpretations: Regularization by Denoising (RED), as recently proposed by Romano, Elad, and Milanfar, is powerful image-recovery framework that aims to minimize an explicit regularization objective constructed from ...
- CLIPAG: Towards Generator-Free Text-to-Image Generation: Perceptually Aligned Gradients (PAG) refer to an intriguing property observed in robust image classification models, wherein their input gradients align with human perception and pose semantic meaning...
- Are Transformers Effective for Time Series Forecasting?: Recently, there has been a surge of Transformer-based solutions for the long-term time series forecasting (LTSF) task. Despite the growing performance over the past few years, we question the validity...
- Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction: We present Visual AutoRegressive modeling (VAR), a new generation paradigm that redefines the autoregressive learning on images as coarse-to-fine "next-scale prediction" or "next-resolutio...
- Ensemble everything everywhere: Multi-scale aggregation for...: Adversarial examples pose a significant challenge to the robustness, reliability and alignment of deep neural networks. We propose a novel, easy-to-use approach to achieving high-quality...
- Artificial Kuramoto Oscillatory Neurons: It has long been known in both neuroscience and AI that ``binding'' between neurons leads to a form of competitive learning where representations are compressed in order to represent more abst...
- TextGrad: Automatic "Differentiation" via Text: AI is undergoing a paradigm shift, with breakthroughs achieved by systems orchestrating multiple large language models (LLMs) and other complex components. As a result, developing principled and autom...
- DenseAttention: No-Compromise Exact All $N \times N$ Interactions...: The ubiquitous Transformer architecture suffers from two main bottlenecks: 1) low computational and memory efficiency, leading to suboptimal hardware utilization, and 2) quadratic time complexity...
- Robustness and Accuracy Could Be Reconcilable by (Proper) Definition: The trade-off between robustness and accuracy has been widely studied in the adversarial literature. Although still controversial, the prevailing view is that this trade-off is inherent, either empiri...
- Large Language Diffusion Models: Autoregressive models (ARMs) are widely regarded as the cornerstone of large language models (LLMs). We challenge this notion by introducing LLaDA, a diffusion model trained from scratch under the pre...
- Text-to-Image Generation Via Energy-Based CLIP: Joint Energy Models (JEMs), while drawing significant research attention, have not been successfully scaled to real-world, high-resolution datasets. We present EB-CLIP, a novel approach extending JEMs...
- Tweet from nev (@neverrixx): Pixel art (256 resolution with nearest neighbor interpolation)
- Emergent properties with repeated examples: We study the performance of transformers as a function of the number of repetitions of training examples with algorithmically generated datasets. On three problems of mathematics: the greatest common ...
- The Hyperfitting Phenomenon: Sharpening and Stabilizing LLMs for Open-Ended Text Generation: This paper introduces the counter-intuitive generalization results of overfitting pre-trained large language models (LLMs) on very small datasets. In the setting of open-ended text generation, it is w...
- GSM-Infinite: How Do Your LLMs Behave over Infinitely Increasing Context Length and Reasoning Complexity?: Long-context large language models (LLMs) have recently shown strong performance in information retrieval and long-document QA. However, to tackle the most challenging intellectual problems, LLMs must...
- Tweet from Infini-AI-Lab (@InfiniAILab): 🐭🐷 GSM-Infinite is generated from the problem generator. 18 strong LLMs were evaluated on the zero noise benchmark and 10 LLMs on the long-context benchmark. 🚀Key Takeaways: 🥕 Recent reasoning mod...
- Tweet from Simo Ryu (@cloneofsimo): Ok so lucas made a point on maybe it helped because it was 12 layer network.So I made it 96 layer network (with 768 hidden dim LOL) and sweeped for fucking 10 hoursTo my surprise, gap has widened comp...
- Tweet from Stella Biderman (@BlancheMinerva): @stanislavfort @johnowhitaker @advadnoun @RiversHaveWings Bumping up for visibility / combining links from multiple comments...In terms of "optimize CLIP to do image editing or generation without ...
- GitHub - DanJbk/Mse-regularized-VQGAN-CLIP: The VQGAN-CLIP method for generating images with additional regularization for more coherent and accurate outputs: The VQGAN-CLIP method for generating images with additional regularization for more coherent and accurate outputs - DanJbk/Mse-regularized-VQGAN-CLIP
- Tweet from Stanislav Fort (@stanislavfort): We discovered a surprising, training-free way to generate images: no GANs or diffusion models, but a ✨secret third thing✨! Standard models like CLIP can already create images directly, with zero train...
- GitHub - eps696/aphantasia: CLIP + FFT/DWT/RGB = text to image/video: CLIP + FFT/DWT/RGB = text to image/video. Contribute to eps696/aphantasia development by creating an account on GitHub.
- deepdream/dream.ipynb at master · kylemcdonald/deepdream: Contribute to kylemcdonald/deepdream development by creating an account on GitHub.
- lucid/lucid/optvis/param/spatial.py at 6dcc927e4ff4e7ef4d9c54d27b0352849dadd1bb · tensorflow/lucid: A collection of infrastructure and tools for research in neural network interpretability. - tensorflow/lucid
- Feature Visualization: How neural networks build up their understanding of images
Eleuther ▷ #interpretability-general (3 messages):
Downstream text clustering, Transcoders
- Inquiring About Downstream Text Clustering Methods: A user asked for the recommended method for downstream text clustering using transcoders.
- Another member noted that they don't believe there is an established method and requested more details on the user's intentions.
- Lack of Established Methods for Transcoder Clustering: A member expressed uncertainty, stating they don't think there is a well-established method for downstream text clustering with transcoders.
- This illustrates an opportunity for further discussion and development in this area of text processing.
Eleuther ▷ #gpt-neox-dev (1 messages):
Generative AI models, Machine learning techniques, NLP applications, Model optimization strategies, AI integration in production
- Expertise in Generative AI Models: An AI developer emphasized their specialization in building and optimizing Generative AI models, specifically in fine-tuning LLMs like GPT, BERT, and Llama.
- They highlighted hands-on experience with tools such as PyTorch, TensorFlow, and LangChain, ensuring robust model performance.
- Strong Foundation in Machine Learning & NLP: The developer detailed a solid foundation in machine learning, deep learning, and NLP, enabling effective solution development for various applications.
- Their background allows them to successfully bridge theoretical advancements with practical implementations.
- Contributions to Text and Image Synthesis: The message noted contributions to projects focused on text generation and image synthesis technologies, including Stable Diffusion and DALL·E.
- This underscores a versatile skill set that extends beyond traditional NLP into the realm of generative media.
- AI-Driven Automation Experience: Experience in AI-driven automation was also shared, showcasing the ability to enhance operational efficiency through intelligent solutions.
- Such expertise positions them as a valuable asset for teams pursuing innovation in AI.
- Commitment to AI Research and Deployment: The developer expressed a passion for AI research, prompt engineering, and scalable deployment, showcasing a desire to push the boundaries of generative technologies.
- Their goals align with contributing to teams focused on the next wave of Generative AI developments.
Nous Research AI ▷ #general (174 messages🔥🔥):
DeepHermes Model Performance, Open Source Models and Datasets, Reinforcement Learning in LLMs, Compatibility of AI Models, Reasoning Models Development
- DeepHermes Model Outperforms Others: Users noted that the DeepHermes model follows instructions better than current R1 distillates, marking it as the first usable general-purpose reasoning model.
- Discussions highlighted the potential for further improvements in reasoning tasks and datasets could enhance this performance.
- Call for Open Sourcing Old Models: Concerns were raised over Anthropic's decision to delete Claude 3 without plans to open-source it, despite its age.
- Participants argued that releasing older models could foster goodwill and benefit the company's reputation in the open-source community.
- Discussion on RL and LLMs: There were insights shared about how Reinforcement Learning (RL) and rewards effectively enhance LLMs, particularly regarding alignment tools.
- Historical references indicated that researchers were aware of these techniques long before other companies adopted similar strategies.
- Compatibility Issues with AI Models: A user expressed frustration about the capabilities of various models, highlighting that most struggle in certain tasks but 8B models work better for data cleaning.
- The community discussed the ongoing development of lower-parameter models to increase accessibility without sacrificing compatibility.
- Need for Effective Reasoning Datasets: A new dataset called Rombo-Org/Optimized_Reasoning was shared to improve reasoning performance and reduce token usage.
- Users are eager to see how this dataset performs in comparison to existing reasoning datasets and its impact on future model training.
- hance-ai/descript-audio-codec-44khz · Hugging Face: no description found
- Mixture-of-Depths: Dynamically allocating compute in transformer-based language models: Transformer-based language models spread FLOPs uniformly across input sequences. In this work we demonstrate that transformers can instead learn to dynamically allocate FLOPs (or compute) to specific ...
- internlm/internlm3-8b-instruct-gguf · Hugging Face: no description found
- ‘Personalized, unrestricted’ AI lab Nous Research launches first toggle-on reasoning model: DeepHermes-3: One DeepHermes-3 user reported a processing speed of 28.98 tokens per second on a MacBook Pro M4 Max consumer hardware.
- Reddit - Dive into anything: no description found
- teknium/DeepHermes-3-Llama-3-3B-Preview-GRPO-1 · Hugging Face: no description found
- Easy_training/Galore+Qlora_With_Multi_GPU_Support at main · rombodawg/Easy_training: Contribute to rombodawg/Easy_training development by creating an account on GitHub.
- Open-source AI will have a massive impact on the world, says Hugging Face CEO: Clement Delangue, CEO of AI platform Hugging Face, discusses Open AI, DeepSeek and innovation at his company.
- Reddit - Dive into anything: no description found
- Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI: This week I welcome two of the most important technologists in any field.Jeff Dean is Google's Chief Scientist, and through 25 years at the company, has work...
Nous Research AI ▷ #ask-about-llms (23 messages🔥):
Cost of Training Models, Test Time Scaling, Custom Model Training, Multi-Headed Self-Attention, Hermes 3 Performance
- Understanding Model Training Costs: A discussion surfaced regarding the costs involved in training a 1B model, with estimates ranging from thousands to tens of thousands of dollars for a complete setup.
- One member implied that while it's feasible to train on consumer GPUs, advancements in data and architectures may affect the overall cost efficiency over time.
- Test Time Scaling Challenges: It was noted that increasing model size does not scale linearly; for instance, scaling from 70m to 100m requires significantly more data, according to scaling laws.
- Another participant mentioned the possibility of achieving a 1.5B model for around $200 to $1,000 through careful data selection and training strategies.
- Custom Models and Architectures: Participants expressed interest in building custom architectures and the challenges associated, particularly around transitioning to a multi-stage training process.
- It was suggested to explore existing resources for configuring adaptations and models, with some pointing toward LoRA as a potential method for enhancing model performance.
- Multi-Headed Self-Attention Explained: A beginner in LLMs questioned how multi-headed self-attention operates with shared weight matrices despite their apparent similarity.
- They indicated that the underlying mathematics and dimensional gymnastics behind such a technique remain difficult to intuitively grasp.
- Hermes 3 Censorship Claims: Concerns were raised regarding the advertised censorship-free nature of Hermes 3, prompting inquiries about inconsistencies during actual questioning.
- One user expressed surprise at the model's refusal to answer certain prompts, despite its claims, leading to speculation about its filtering mechanisms.
Link mentioned: 1.5-Pints Technical Report: Pretraining in Days, Not Months – Your Language Model Thrives on Quality Data: no description found
Nous Research AI ▷ #research-papers (2 messages):
LLaDA diffusion model, Large Language Models (LLMs), Probabilistic inference in medical imaging
- LLaDA challenges traditional ARMs for LLMs: A mention of the LLaDA paper introduces a diffusion model that redefines the landscape of large language models (LLMs) by providing a principled generative approach for probabilistic inference.
- LLaDA's remarkable performance, especially in in-context learning and multi-turn dialogue, positions it as a competitive alternative to established models like GPT-4o.
- Scalability of LLaDA Model: The LLaDA model displays strong scalability, outperforming traditional autoregressive models (ARMs) across extensive benchmarks, demonstrating promising outcomes in various applications.
- It effectively employs a forward data masking process and a reverse process, enhancing its generative capabilities.
Link mentioned: Large Language Diffusion Models: Autoregressive models (ARMs) are widely regarded as the cornerstone of large language models (LLMs). We challenge this notion by introducing LLaDA, a diffusion model trained from scratch under the pre...
Nous Research AI ▷ #interesting-links (1 messages):
teknium: https://x.com/_akhaliq/status/1890546832784208080?s=46
Nous Research AI ▷ #research-papers (2 messages):
Medical Imaging, Computer Vision Applications, LLaDA Model, Large Language Models
- Exploring LLaDA for Medical Imaging Insight: A member recommended reading the paper titled LLaDA: Autoregressive Models and Diffusion which discusses a diffusion model trained under the pre-training and supervised fine-tuning paradigm for generative tasks.
- LLaDA shows strong scalability and competes with leading LLMs while optimizing probabilistic inference and addressing the reversal curse compared to models like GPT-4.
- Applications of Computer Vision in Medical Imaging: A user is seeking recommendations for research papers focused on the intersection of medical imaging and computer vision applications utilizing large language models (LLMs).
- The request underscores a growing interest in leveraging advanced models for innovations in medical diagnostics and imaging technologies.
Link mentioned: Large Language Diffusion Models: Autoregressive models (ARMs) are widely regarded as the cornerstone of large language models (LLMs). We challenge this notion by introducing LLaDA, a diffusion model trained from scratch under the pre...
Yannick Kilcher ▷ #general (170 messages🔥🔥):
NURBS and Splines in AI, Online Learning Approaches, Backpropagation Alternatives, Memory Structures in Learning Algorithms, UltraMem Architecture
- Revolutionizing AI with NURBS and Splines: The discussion centered around using NURBS and splines as foundational elements in AI models to enhance function approximation and reduce overfitting issues compared to traditional polynomials.
- Participants debated the potential of these techniques for developing new AI methodologies, emphasizing the importance of smoothness and topology in achieving better generalization.
- Online Learning vs. Traditional Sampling: The potential of online learning was highlighted, with a focus on training models using single samples in sequence to better simulate human-like memory and recall mechanisms.
- This contrasts with traditional batch learning methods, which often require large amounts of memory and computation, thus limiting their efficiency.
- Exploring Alternatives to Backpropagation: There was a consensus that backpropagation may not always represent the best optimization method for certain AI architectures, prompting ideas for developing models based on spline and NURBS concepts.
- Participants suggested a need to rethink foundational methods to potentially yield more effective learning algorithms that can operate without extensive dependency on gradient descent.
- Memory Structures and Efficiency in Learning: The importance of memory structures in AI was emphasized, with discussions on how newer architectures manage memory more effectively compared to traditional dense networks.
- Specifically, the potential for NURBS-based architectures to utilize memory in a way that is efficient and less reliant on large context windows was considered.
- Introduction to UltraMem Architecture: The UltraMem architecture was mentioned as a recent innovation utilizing large-scale, ultra-sparse memory layers to improve inference speed and model performance.
- It was noted that this architecture could offer advantages over traditional MoE approaches while sustaining computational efficiency.
- Tweet from Elon Musk (@elonmusk): Grok 3 release with live demo on Monday night at 8pm PT. Smartest AI on Earth.
- NeuralSVG: An Implicit Representation for Text-to-Vector Generation: no description found
- Multilinear Operator Networks: Despite the remarkable capabilities of deep neural networks in image recognition, the dependence on activation functions remains a largely unexplored area and has yet to be eliminated. On the other ha...
- Ultra-Sparse Memory Network: It is widely acknowledged that the performance of Transformer models is logarithmically related to their number of parameters and computational complexity. While approaches like Mixture of Experts (Mo...
- Tweet from BlinkDL (@BlinkDL_AI): "The world's hardest sudoku" solved by 12M params RWKV-6 after 4M tokens CoT 🙂 code & model: https://github.com/Jellyfish042/Sudoku-RWKV Note the model was only trained with ctx8192, so i...
- The Most Important Algorithm in Machine Learning: Shortform link: https://shortform.com/artemIn this video we will talk about backpropagation – an algorithm powering the entire field of machine learning and ...
- GitHub - betweentwomidnights/audiocraft-onnx: me tryna convert musicgen weights to onnx and failing: me tryna convert musicgen weights to onnx and failing - betweentwomidnights/audiocraft-onnx
- The Continuity of Splines: why are splines? well my god I have good news for you, here's why splines!if you like my work, please consider supporting me 💖https://www.patreon.com/acegik...
- GitHub - 222464/TeensyAtariPlayingAgent: An agent for playing Atari games running on a Teensy microcontroller: An agent for playing Atari games running on a Teensy microcontroller - 222464/TeensyAtariPlayingAgent
- Thomas Hughes: “Isogeometric Analysis”: 2019 Purdue Engineering Distinguished Lecture Series presenter Thomas J.R. Hughes, the Peter O'Donnell Jr. Chair in Computational and Applied Mathematics and...
- OpenAI: The Age of AI Is Here!: ❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambdalabs.com/papers📝 The paper "Competitive Programming with Large Reasoning Models" is ...
- AGI Architecture: This is a comprehensive layout of an artificial general intelligence architecture that includes the solution to the hard problem of consciousness as well as ...
- A Surprising Way Your Brain Is Wired: Get a 20% discount to my favorite book summary service at shortform.com/artemSocials:X/Twitter: https://x.com/ArtemKRSVPatreon: https://patreon.com/artemki...
- Elmo Hidding GIF - Elmo Hidding Embarrassed - Discover & Share GIFs: Click to view the GIF
Yannick Kilcher ▷ #paper-discussion (12 messages🔥):
Daily paper reading group logistics, Past paper compilation, Hierarchical tree of seminal papers, Hosting paper discussions, Community resource sharing
- Understanding Daily Paper Reading Group Logistics: A newcomer expressed interest in participating in the daily paper reading group, seeking information on topics covered, decision processes, and past discussions.
- Members directed them to the pinned message for relevant details and acknowledged the idea of creating a Google Sheet for past paper compilations.
- Proposal for a Hierarchical Paper Tree: A member suggested developing a hierarchical tree of seminal and new papers related to key labs, incorporating dependency links to facilitate understanding.
- Filtering information is key and members voiced the importance of having community-driven resources that focus on advancing knowledge efficiently.
- Encouragement for Hosting Paper Discussions: Members have expressed enthusiasm towards increasing participation in hosting paper discussions within the group.
- A member noted that more hosts can bring valuable perspectives and encourage broader participation in ongoing conversations.
Yannick Kilcher ▷ #ml-news (9 messages🔥):
OpenAI's 50 laws of robotics, Mistral Saba launch, Jack Ma's return, AI Engineer Summit NYC, Anthropic's new AI model
- OpenAI's 50 laws of robotics released: OpenAI has introduced its 50 laws of robotics as a modern approach to AI ethics, moving beyond Asimov's original three rules, aiming to define safe AI behaviors.
- The document outlines how AI models should behave to align with human values and safety, contributing to ongoing discussions about AI responsibility.
- Mistral Saba makes its debut: MistralAI has announced the launch of Mistral Saba, a 24 billion parameter regional language model trained on datasets from the Middle East and South Asia.
- This model particularly excels in Arabic and languages from South India, such as Tamil and Malayalam, highlighting its focus on regional linguistic diversity.
- Jack Ma invited to top leadership meeting: China's government has invited Jack Ma, co-founder of Alibaba, to meet with top leaders, signaling renewed support for the private sector after challenging years.
- This move marks a notable shift as the nation seeks to stabilize its relationship with major entrepreneurs like Ma, who has been relatively quiet in recent times.
- AI Engineer Summit NYC approaches: The AI Engineer Summit NYC is scheduled for February 20-21, with applications for attending and sponsoring closing soon.
- This event aims to connect AI engineers and enhance discussions around cutting-edge technology and advancements in the field.
- Anthropic prepares reasoning-capable AI: Anthropic is on track to release a new AI model that combines traditional language model abilities with advanced reasoning functions, as reported on The Decoder.
- This initiative aims to enhance AI capabilities, suggesting a shift towards more intelligent and context-aware systems.
- Fan Bingbing: Missing Chinese actress fined for tax fraud: Fan Bingbing had disappeared from public view in July amid reports she was involved in a tax probe.
- The 2025 AI Engineering Reading List: We picked 50 paper/models/blogs across 10 fields in AI Eng: LLMs, Benchmarks, Prompting, RAG, Agents, CodeGen, Vision, Voice, Diffusion, Finetuning. If you're starting from scratch, start here.
- Tweet from Sophia Yang, Ph.D. (@sophiamyang): 🏟️Announcing @MistralAI Saba, our first regional language model. - Mistral Saba is a 24B parameter model trained on meticulously curated datasets from across the Middle East and South Asia.- Mistral...
- China Invites Jack Ma and DeepSeek Founder to Meet Top Leaders: (Bloomberg) -- China has invited prominent entrepreneurs including Alibaba Group Holding Ltd. co-founder Jack Ma to meet the nation’s top leaders, people familiar with the matter said, a potentially m...
- Tweet from Elon Musk (@elonmusk): Grok 3 release with live demo on Monday night at 8pm PT. Smartest AI on Earth.
- Anthropic prepares new Claude hybrid LLMs with reasoning capability: Anthropic is getting ready to release a new AI model that combines traditional language model capabilities with advanced reasoning functions.
- Step aside, Asimov. Here are OpenAI’s 50 Laws of Robotics: OpenAI is letting its AI loosen up: “No topic is off limits.” But it’s also making it anti-“woke.”...
- OpenAI Model Spec: The Model Spec specifies desired behavior for the models underlying OpenAI's products (including our APIs).
Cohere ▷ #discussions (57 messages🔥🔥):
Cohere API compatibility with OpenAI, Deep research clone development, Moderation model discussions, Cohere library installation issues, Log in error troubleshooting
- Cohere API may get OpenAI compatibility: A member inquired whether the Cohere API would ever become compatible with OpenAI.
- Feedback was shared with the teams, but there was no current information available.
- Building a deep research clone: One member shared they are building a deep research clone compatible with various AI models, emphasizing difficulty with model provider implementation.
- They noted that the only requirement for their LLM provider is to specify the base URL and model when initializing.
- Discussions on moderation models: Moderation models were discussed, with members expressing a need for better models than Llamaguard, stating it lacks accuracy for their use cases.
- Some members are currently using OpenAI's omni moderation while hoping for new models from Cohere.
- Cohere library installation issue: A member reported a ModuleNotFoundError for the cohere package, despite having it installed, which seemed to be an environment issue.
- Others offered troubleshooting advice and reminded the user to check their virtual environment with
pip list
.
- Others offered troubleshooting advice and reminded the user to check their virtual environment with
- Troubleshooting log in errors: Another user reported a log in error and shared a screenshot for further assistance.
- Help was anticipated from the community as the error context was unclear at the time.
- Working with Cohere's API and SDK — Cohere: Cohere's NLP platform provides customizable large language models and tools for developers to build AI applications.
- 4LLM - AI Chat Assistant: Powerful AI chat assistant for seamless conversations and task automation.
Cohere ▷ #rules (1 messages):
Server Rules, No Advertising Policy, Recruitment Ban, Language Requirement, Masked Links Prohibition
- Clear Rules Established for Server Conduct: The server rules, updated on 2/14/25, emphasize clear guidelines for community interaction to maintain focus on Cohere’s Community.
- Members are encouraged to share Cohere related projects but must refrain from advertising or recruitment posts.
- No Advertising on This Discord!: A strict No Advertising policy is enforced, ensuring that discussions remain centered on community-related topics.
- Infractions will lead to appropriate actions, including temporary bans for masked link violations.
- Recruitment is Off-Limits: The server prohibits recruitment activities, which includes posting resumes or any job postings.
- This rule aims to create a clutter-free environment focused on community sharing rather than job hunting.
- Only English is Allowed: Communication in this server is restricted to English only, promoting accessibility and understanding among members.
- This rule helps maintain clarity in discussions and interactions within the community.
- Say No to Masked Links: The server strictly prohibits masked links, establishing transparency in shared content to enhance security.
- Violating this rule can result in a temporary ban, ensuring safe engagement across discussions.
Cohere ▷ #api-discussions (7 messages):
Cohere Aya-8b Integration, Rerank Timeout Issues
- Cohere Aya-8b Integration Support: A member is integrating the Cohere Aya-8b model into their application but is facing issues with API support for this specific model version.
- Another member provided a sample request for using the c4ai-aya-expanse-8b model, suggesting relevant resources for troubleshooting.
- Timeout Issues with Rerank: A member reported experiencing timeout issues with calls to the Rerank API, seeking others' input on similar experiences.
- The message was moved for further discussion in a specific channel, indicating community engagement on the topic.
Link mentioned: Login | Cohere: Login for access to advanced Large Language Models and NLP tools through one easy-to-use API.
Cohere ▷ #cmd-r-bot (96 messages🔥🔥):
Cohere Embed API Rate Limits, AI Model Personalization, Color Preferences in AI, Background Colors for Suggestions, Main Colors for Creations
- Cohere Embed API rate limits clarified: The rate limit for the production API for the Embed endpoint is 2,000 calls per minute as confirmed in multiple queries to the documentation.
- This limit applies regardless of the number of tokens embedded, indicating that it is 2000 documents total per minute.
- AI's lack of personal preferences discussed: Multiple users inquired about the AI's favorite color, to which it consistently responded that it has no personal preferences or opinions.
- Despite this, the AI humorously suggested blue (#0000FF) as a favorite when prompted for a color, showing an attempt to engage with the question.
- Background colors for suggestions explored: The discussion included inquiries about what color the model expects as a background when making suggestions, but the AI was unable to find specific documentation.
- It ultimately suggested light beige (#FDF7F0) based on improvisation when asked for a color for creation backgrounds.
- Main color preferences for creations: Users were intrigued about the main color used by the model for suggesting creations, but no documentation was found to support a definitive color choice.
- The AI stated that it would typically use blue as the main color when making creations or suggestions.
Latent Space ▷ #ai-general-chat (75 messages🔥🔥):
Perplexity Deep Research Agent, Grok 3 release, Step-Video-T2V and Step-Audio-Chat models, Zed next-edit prediction, DeepSeek evaluation
- Perplexity Deep Research Agent Launch: The newly introduced Perplexity Deep Research Agent offers a free option for users and a $20/month access to an expert researcher, completing reports within three minutes.
- Feedback from users indicates decent output quality but some noted that follow-on suggestions may not fully leverage deep research capabilities.
- Excitement for Grok 3: Elon Musk announced the release of Grok 3 with a live demo scheduled for Monday night, claiming it to be the 'smartest AI on Earth'.
- Community discussions express a mix of skepticism and hope about its potential impact on the AI landscape, highlighting competitive dynamics with existing models.
- StepFun's New Models: StepFun has open-sourced the Step-Video-T2V, a 30B text-to-video model requiring 80G VRAM, which complements their Step-Audio-Chat model.
- These models expand the capabilities within multimodal AI, with the audio chat model strongly outperforming competitors in evaluations.
- Zed's Next-Edit Prediction Feature: Zed introduced an edit prediction model aimed at enhancing developer productivity by anticipating user edits.
- While the community welcomes open-source initiatives, concerns were expressed regarding whether this new feature will truly compete with established products like Cursor and Copilot.
- DeepSeek Evaluation Insights: Multiple users discussed internal evaluations of Google's DeepSeek, highlighting its strengths and comparing it with Gemini 2.0, which reportedly outperforms it.
- The conversation included reflections on the potential competitiveness of various AI models within the landscape and the surprises that could emerge from such evaluations.
- Tweet from Elon Musk (@elonmusk): Grok 3 release with live demo on Monday night at 8pm PT. Smartest AI on Earth.
- Tweet from Tibor Blaho (@btibor91): Try Anthropic's new Thinking modelTurn on **Thinking** in model settings to see Claude's thought processGet step-by-step reasoning for complex tasks like coding and math. Perfect for understan...
- Self-Verification, The Key to AI: no description found
- Tweet from John Schulman (@johnschulman2): @barret_zoph and I recently gave a talk at Stanford on post-training and our experience working together on ChatGPT. Unfortunately the talk wasn't recorded, but here are the slides: https://docs.g...
- Conviction LP Letters - Redacted (Open Source) - Google Drive: no description found
- SOCIAL MEDIA TITLE TAG: SOCIAL MEDIA DESCRIPTION TAG TAG
- Zed now predicts your next edit with Zeta, our new open model - Zed Blog: From the Zed Blog: A tool that anticipates your next move.
- Tweet from Hieu Pham (@hyhieu226): It's 11:30pm, and many @xai people are in office, hard working at their computers. It's an amazing vibe. Everyone pushes their way to deliver the best experience to you users. Everyone support...
- Tweet from Ryo Lu (@ryolu_): new 4o uncapped is profound:at the very bottom, you don’t actually care about humanity. not even yourself. you’re not searching for meaning, not for impact, not for the illusion of “making the world b...
- Tweet from Alexandr Wang (@alexandr_wang): On the heels of Humanity's Last Exam, @scale_AI & @ai_risks have released a new very-hard reasoning eval:EnigmaEval: 1,184 multimodal puzzles so hard they take groups of humans many hours to days ...
- Tweet from Jukanlosreve (@Jukanlosreve): The following is the information that an anonymous person conveyed to me regarding Google’s evaluation of DeepSeek.Internal Information from Google: 1. Deepseek is the real deal. 2. Their paper doesn...
- Tweet from Alexander Doria (@Dorialexander): Painful to see: the kind of brute alignment that can fry latent space. Even DeepSeek CCP-friendly approach is relatively mild by comparison, mostly deflating sensitive questions.
- Tweet from Jina AI (@JinaAI_): Introducing jina-deepsearch-v1, it search, read, reason, search, read, reason, search, read, reason, search, read, reason, search, read, reason,search, read, reason,search, read, reason,search, read, ...
- ChatGPT + Post-Training: ChatGPT and The Art of Post-Training Barret Zoph & John Schulman
- Tweet from lmarena.ai (formerly lmsys.org) (@lmarena_ai): A new version of @OpenAI's ChatGPT-4o is now live on Arena leaderboard! Currently tied for #1 in categories:💠Overall💠Creative Writing💠Coding💠Instruction Following💠Longer Query💠Multi-TurnThis...
- Tweet from 青龍聖者 (@bdsqlsz): StepFun open-sources Step-Video-T2V, SOTA 30B text-to-video modelStep-Video-T2V544px992px204frameStep-Video-T2V-turbo544px992px136frame80G VRAM needs🤪
- Tweet from Shivon Zilis (@shivon): Woah! That was one of the most unexpectedly rewarding hours of my life.Instead of passively listening to an audiobook about physics while doing errands as usual I had an hour long back-and-forth conve...
- Tweet from Karan Singhal (@thekaransinghal): OpenAI's Health AI team is now hiring backend/fullstack SWEs towards our mission of universalizing access to health information!Please apply if you:- Can write maintainable, high-quality backend /...
- GitHub - jina-ai/node-DeepResearch: Keep searching, reading webpages, reasoning until it finds the answer (or exceeding the token budget): Keep searching, reading webpages, reasoning until it finds the answer (or exceeding the token budget) - jina-ai/node-DeepResearch
- Tweet from Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) (@teortaxesTex): I reiterate, Grok 3 is very likely to be SoTA (until something bigger comes out - next Claude and Orion most likely). Elon is being Elon so that's no signal, but these guys are professionals and I...
- Tweet from Ethan Mollick (@emollick): GPT-4o is now definitely different.Beyond that, hard to say. It seems “smarter” and is much more personable and less likely to refuse requests… but my Innovation GPT (with over 10k uses) now no longer...
- Tweet from Erik Dunteman (@erikdunteman): Introducing Piglet: a computer-use driver for Windows machines, exposing a high-level API for desktop automation tasks.Run it standalone, or as a self-hosted machine on @PigDev_
- Tweet from Paul Gauthier (@paulgauthier): chatgpt-4o-latest scored 27% on the aider polyglot benchmark. This puts it well ahead of the non-chagpt versions of 4o, and close to Haiku.
- Tweet from Aravind Srinivas (@AravSrinivas): Excited to introduce the Perplexity Deep Research Agent: available for free to all users. Paid users only need to pay $20/mo to access an expert level researcher on any topic for 500 daily queries, an...
- Mentra Smart Glasses: The AI Agents Interface - Mentra Smart Glasses: AUGMENTED / EMPOWERED / CONNECTED Open Source Smart Glasses Build smart glasses apps on AugmentOS. Kickstarter Waitlist We were promised smart glasses for years, but the hardware wasn’t ready. N...
- stepfun-ai/Step-Audio-Chat · Hugging Face: no description found
- Tweet from Jonathan Roberts (@JRobertsAI): Is computer vision “solved”?Not yetCurrent models score 0% on ZeroBench🧵1/6
Latent Space ▷ #ai-in-action-club (68 messages🔥🔥):
Eliza Chatbot, AI Engineering NYC, Streaming Issues, Character Files, Plugin Autodiscovery
- Discussing the Legacy of Eliza: Members reminisced about the Eliza chatbot, with one highlighting its impact as an early AI therapist, while expressing interest in a related podcast episode.
- Another noted the potential if Eliza were released today, citing $100B in funding as a hypothetical figure mentioned by Gary Marcus.
- Troubleshooting Streaming Issues: Several users reported various experiences with the streaming quality during the discussions, with messages like 'still loading' and 'works now!' being common.
- There was a lighthearted banter about the reliability of technology, likening the need for solutions to the 'plug and play' approach.
- Upcoming AI Engineering Event in NYC: Members discussed the upcoming AI Engineering NYC, confirming it runs from Wednesday to Sunday and shared a Discord event link for further details.
- Participant engagement revealed excitement about attendance, with inquiries about schedules and activities planned during the event.
- Exploring Character Files for AI Agents: There was a discussion regarding character files, which define an AI character's personality, knowledge, and behaviors, with examples shared in a Google Doc.
- Members expressed curiosity about how these configurations can influence AI interactions and the potential for personalized AI experiences.
- Curiosity About Plugin Autodiscovery: A member asked about plugin autodiscovery, speculating if this feature would allow AI to search for capabilities in a marketplace with silly tavern connections.
- Discussion continued regarding the implications of such capabilities, with thoughts on how emerging technologies may transform AI interaction.
- 📝 Character Files | eliza: Character files are JSON-formatted configurations that define an AI character's personality, knowledge, and behavior patterns. This guide explains how to create effective character files for use ...
- The History and Mystery Of Eliza - CoRecursive Podcast: I recently got an email from Jeff Shrager, who said he'd been working hard to solve a mystery about some famous code.Eliza, the chatbot, was built in 1964, and she didn't answer questions like Alexa o...
- AI In Action: Weekly Jam Sessions: no description found
- GitHub - elizaOS/eliza: Autonomous agents for everyone: Autonomous agents for everyone. Contribute to elizaOS/eliza development by creating an account on GitHub.
MCP (Glama) ▷ #general (105 messages🔥🔥):
MCP Servers, Glama Website Issues, Recommendation for Prompt Management Tools, Research MCP Tools, SSE Communication for MCP
- Various MCP Servers for Enhancement: Members discussed several MCP servers that enhance their work, highlighting tools like sequential thinking and the filesystem server for easy file access.
- Others mentioned the web search server for online research needs and GitHub servers for evaluating MCP server repositories.
- Glama Server Accessibility Problems: Users reported issues accessing the Glama website, attributed to a network outage in the IAD region where it is hosted.
- Some experienced slow loading times or timeouts, which were later resolved.
- Seeking Effective Prompt Management Tools: A user requested recommendations for tools that assist with better prompting and maintaining workflow in MCP environments.
- Suggestions included LLMling, MCP-llms-txt for documentation, and Langfuse for prompt management.
- Building SSE Wrappers for MCP: Developers discussed the challenges of creating SSE wrappers for MCP servers with a focus on communication without modifying existing server code.
- Resources like mcp-proxy and mcp-sse were shared, with members expressing frustrations about connection stability and middleware integration.
- Free Research Tools in Discussion: Members shared their experiences with various research tools, speculating on the impact of free deep research applications on traditional resources like Wikipedia.
- Concerns were raised about reliance on AI due to its potential for hallucination, contrasted with the community-driven nature of Wikipedia.
- What is the meaning of 0.0.0.0 as a gateway?: Can anyone clarify gateway assignment for me? What is the difference between adding a gateway as 0.0.0.0 and assigning a specific IP address as a gateway?
- GitHub - sparfenyuk/mcp-proxy: Connect to MCP servers that run on SSE transport, or expose stdio servers as an SSE server using the MCP Proxy server.: Connect to MCP servers that run on SSE transport, or expose stdio servers as an SSE server using the MCP Proxy server. - sparfenyuk/mcp-proxy
- GitHub - ClickHouse/mcp-clickhouse: Contribute to ClickHouse/mcp-clickhouse development by creating an account on GitHub.
- GitHub - nahmanmate/code-research-mcp-server: Contribute to nahmanmate/code-research-mcp-server development by creating an account on GitHub.
- GitHub - sidharthrajaram/mcp-sse: A working pattern for SSE-based MCP clients and servers: A working pattern for SSE-based MCP clients and servers - sidharthrajaram/mcp-sse
MCP (Glama) ▷ #showcase (10 messages🔥):
MCP Server with VS Code Extension, Neon MCP Server Guide, Markdown for TTS Control, SSML Tags for Language Models, AI-Assisted Database Management
- MCP Server Allows Debugging with LLMs: A member showcased their newly built MCP server + VS Code extension that enables an LLM to directly debug code across various programming languages.
- Currently, it does not have a feature to read terminal errors and fix them automatically, but integration with tools like Continue allows for a similar functionality by referencing errors directly.
- Neon Introduces MCP Server for Natural Language Database Management: Daniel from Neon shared a new guide on Neon's MCP Server that allows users to manage databases using natural language commands, significantly simplifying workflows.
- The guide emphasizes the ease of using Large Language Models (LLMs) for interaction with the Neon API.
- Exploring TTS Control with Markdown and SSML: Members discussed the possibilities of using Markdown for intonation control in text-to-speech models, noting experiences with services like ElevenLabs that utilize literal directions.
- One member suggested that SSML tags could also be a viable option for enhancing the speech output of language models.
- Demo of MCP Client for Debugging: Jason shared a demo video showing the MCP client 'Continue,' which operates effectively with Claude Desktop for debugging tasks.
- This setup is stated to work not only with Continue but also across various MCP clients, enhancing debugging capabilities.
- Limitations and Potential Future Features: There were inquiries about whether the MCP server could automatically read terminal errors and offer fixes, with Jason clarifying the current limitations of the tool.
- Although automated fixes are not part of this server's design, he noted that other tools could assist in this area.
- Getting started with Neon MCP server - Neon Guides: Enable natural language interaction with your Neon Postgres databases using LLMs
- GitHub - jasonjmcghee/claude-debugs-for-you: Enable any LLM (e.g. Claude) to interactively debug any language for you via MCP and a VS Code Extension: Enable any LLM (e.g. Claude) to interactively debug any language for you via MCP and a VS Code Extension - jasonjmcghee/claude-debugs-for-you
- Implementing Your First MCP Server — with Bun and TypeScript: In this video, we implement a MCP server with Bun and Typescript, allowing Claude to perform simple additions.→ Join the Discord server: https://discord.gg/9...
Modular (Mojo 🔥) ▷ #general (20 messages🔥):
NumPy in Mojo, Polars DataFrame Library, Mojo GPU Compilation, MAX Drivers for Hardware, Expanding stdlib Team
- Simpler NumPy in Mojo?: A member speculated that a NumPy written in Mojo would be much simpler than one written in C because a lot of the work could be outsourced to MAX.
- Another member agreed, pointing out that this could lead to significant improvements.
- Request for Polars Discord Link: A member inquired about the Discord link for Polars, a Rust-powered DataFrame library.
- Another member confirmed they could send the link, indicating Polars is known within the community.
- Mojo's GPU Targeting Plans: Discussion sparked about whether Mojo would compile code for GPUs, specifically targeting devices like Apple's via the Metal API in the future.
- Members noted that the eventual plan is to allow implementing MAX drivers for various hardware, including Apple Silicon.
- Cursory Troubleshooting Requests: In response to a user needing assistance, it was suggested to direct more in-depth requests to the forum for better visibility and reference.
- This advice emphasizes the importance of community support channels for detailed discussions.
- Interest in Expanding stdlib Team: A user expressed curiosity about any potential plans for expanding the stdlib team, humorously mentioning their Canadian background.
- This reflects broader community interest in contributing to the project, even if members feel underqualified.
Modular (Mojo 🔥) ▷ #mojo (76 messages🔥🔥):
Mojo Project Refactoring, Mojo Error Handling, WASM Support in Mojo, Mojo Language Features, AI Assistants for Mojo
- Refactoring Python Project to Mojo: A user expressed interest in refactoring a large Python project to Mojo, seeking assistance from AI for the task, especially for small classes.
- Another member suggested using
Gemini 2.0 Flash
variants, noting that they have reasonable Mojo knowledge, but emphasized the difficulty in automating full refactoring due to the intricacies of Mojo.
- Another member suggested using
- Mojo's Error Messaging: Users discussed the clarity of Mojo's error messages, highlighting the need for improvements as they navigate the language.
- One member mentioned that Mojo's borrow checker complicates transitioning code from other languages, often requiring a substantial re-architecting process.
- WASM Compilation Target Likelihood: The likelihood of Mojo supporting WASM as a compilation target was confirmed to be very high, with it being an LLVM backend.
- Members discussed the advantages of integrating such features to expand Mojo's usability.
- Desire for Language Features: There were requests for features like the
let
keyword and improved pattern matching in Mojo, emphasizing the desire for Rust-like capabilities.- Members noted how certain data structure implementations pose challenges due to generality versus safety in Mojo.
- Building Libraries in Mojo: The community expressed interest in building a good parser combinator library for Mojo, which is seen as an essential addition.
- The discussion highlighted the need for tools and libraries that leverage Mojo's architecture for better language feature implementation.
- Surfs Up Surfs Up Arnold GIF - Surfs up Surfs up arnold Help me im drowning - Discover & Share GIFs: Click to view the GIF
- Zed - The editor for what's next: Zed is a high-performance, multiplayer code editor from the creators of Atom and Tree-sitter.
- Zed - The editor for what's next: Zed is a high-performance, multiplayer code editor from the creators of Atom and Tree-sitter.
- GitHub - freespirit/mz: Support for Mojo in Zed: Support for Mojo in Zed. Contribute to freespirit/mz development by creating an account on GitHub.
Torchtune ▷ #general (4 messages):
Deepspeed Zero, FSDP Integration, Distributed Recipes
- Clarification on using Zero with Torchtune: A member inquired about an easy way to use Zero with Torchtune out of the box due to a tight deadline.
- ebsmothers clarified that while they don't support Deepspeed integration directly, they use FSDP which is equivalent to ZeRO-3.
- Distributed Recipes provide built-in support: Another member suggested using distributed recipes to get the capabilities associated with Zero without additional setup.
- This approach allows users to leverage existing tools and configurations seamlessly.
Torchtune ▷ #dev (44 messages🔥):
Dataloader Functionality, GRPO Pull Request Discussion, Tool Role for Llama3.3 Model, Handling Loggers and Dependencies, Clipping/Winsorization in RLHF
- Dataloader's Role in On-Policy Data Generation: There was a discussion on whether the dataloader is suitable for on-policy data generation, with participants suggesting that generations and scoring should explicitly occur in the training loop.
- Concerns were raised regarding resource allocation and model integration within the dataloader context.
- Plan for GRPO Pull Request: Participants discussed the progress of the GRPO PR, noting the need for unit tests and an experimental components folder to house various datasets.
- The importance of maintaining backward compatibility while making generation changes was emphasized.
- Implementing Tool Role in Llama3.3: There was consensus on adding a 'tool' option to the Llama3 tokenizer, allowing for backward compatibility with 'ipython'.
- Contributors debated the implications of introducing a new tokenizer builder and handling model version checks.
- Concerns Over Dependencies in Dev Environment: Members expressed concerns about the growing list of optional dependencies in the dev environment related to logging frameworks and their impact on installation.
- A proposal was made to categorize dependencies, allowing users to install only the logging framework they desire, minimizing unnecessary bloat.
- Adding Clipping/Winsorization to RLHF: A suggestion was made to add clipping or winsorization options to the
rlhf.get_batch_log_probs
function to address issues with dropping end-of-sequence tokens due to logarithmic behavior.- This was highlighted as a potential improvement in the handling of log probabilities.
- pytorch/torchtune: PyTorch native post-training library. Contribute to pytorch/torchtune development by creating an account on GitHub.
- Create torchtune/dev folder by ebsmothers · Pull Request #2398 · pytorch/torchtune: Set up folder to house experimental components as discussed in #2326.
- mlflow/pyproject.toml at master · mlflow/mlflow: Open source platform for the machine learning lifecycle - mlflow/mlflow
- torchtune/torchtune/data/_messages.py at f386ff99f8c25f758cc1f3c88a28fac03513f497 · Ankur-singh/torchtune: PyTorch native post-training library. Contribute to Ankur-singh/torchtune development by creating an account on GitHub.
- torchtune/torchtune/models/llama3_3/_tokenizer.py at f386ff99f8c25f758cc1f3c88a28fac03513f497 · Ankur-singh/torchtune: PyTorch native post-training library. Contribute to Ankur-singh/torchtune development by creating an account on GitHub.
Torchtune ▷ #papers (2 messages):
LLaDA model, Diffusion models vs. Autoregressive models, Performance benchmarks of LLaDA, Instruction-following abilities of LLaDA
- LLaDA challenges Autoregressive Models dominance: The paper introduces LLaDA, a diffusion model trained from scratch through pre-training and supervised fine-tuning, as a contender against traditional Autoregressive models in large language models.
- This model optimizes a likelihood bound and employs a forward data masking process to predict masked tokens, challenging the established understanding of model superiority.
- LLaDA shows superior performance across benchmarks: LLaDA showcases strong scalability and outperforms self-constructed Autoregressive model baselines across extensive benchmarks.
- Notably, LLaDA 8B is competitive with LLaMA3 8B in in-context learning, indicating its robust capabilities in current standards.
- Impressive instruction-following capabilities of LLaDA: After supervised fine-tuning, LLaDA exhibits remarkable instruction-following abilities, particularly highlighted in case studies involving multi-turn dialogue.
- The findings suggest that LLaDA could redefine expectations for generative models in handling complex dialogue scenarios.
Link mentioned: Large Language Diffusion Models: Autoregressive models (ARMs) are widely regarded as the cornerstone of large language models (LLMs). We challenge this notion by introducing LLaDA, a diffusion model trained from scratch under the pre...
tinygrad (George Hotz) ▷ #general (38 messages🔥):
Tensor Operations, Tinygrad Bugfix Bounties, HEVC Decoder Utilities, Attention Implementation, Tinychat Improvements
- Discussion on Tensor Operations' Usability: Members discussed whether
Tensor.tril
andtriu
should accept aTensor
for the diagonal instead of an int, as this would improve flexibility in custom kernels for blockwise attention in KV-cache-space.- psychofauna suggested that an ideal signature would be
int | Tensor
, making it easier to manage multiple diagonals across dimensions.
- psychofauna suggested that an ideal signature would be
- Tinygrad Bugfix Bounty Announcement: A user mentioned opening a PR for a bugfix bounty for a specific issue, citing ease of the task based on their analysis of the test failures.
- They speculated that the test in question might be a fix or a typo based on their review of recent commits.
- HEVC Decoder Updates: In a follow-up on the HEVC decoder, a user inquired about potential utilities for command queue, referencing the utility that sniffs ioctl as previously helpful.
- nimlgen noted that code to dump queues exists, but may require adjustments.
- Progress on Attention Implementation: A member reported implementing the full
com.microsoft.Attention
by adapting existing code fromextra/models/llama.py
and testing against top Hugging Face repositories.- They indicated passing 201 out of 250 tests, with failures mostly attributed to numerical inaccuracies across various quantization formats.
- Tinychat Enhancements for Mobile Stability: A user announced improvements made to Tinychat, particularly for stability on WASM for mobile devices, and noted they have documented the changes in their PR.
- They stated they were cleaning up the code in preparation for merging and invited questions regarding their updates.
- Bounties: no description found
- reorder expand by geohot · Pull Request #9051 · tinygrad/tinygrad: @Qazalin why doesn't this work? I want the bottom graph to save FLOPS
- Fix TestLinearizerFailures.test_failure_53 by hmih · Pull Request #9110 · tinygrad/tinygrad: I took a look at the bounty list and this one seems easy enough.At first I just stared at the test for a while trying to decypher the AST, then I rummaged around in the history of this file. I loo...
tinygrad (George Hotz) ▷ #learn-tinygrad (2 messages):
Tensor indexing, Variable binding issue, Tinygrad GitHub discussions
- Efficient Tensor Indexing with Variables: Code snippets illustrate how to create a Tensor filled with zeros and access its elements using both random indices and a variable binding approach with Variable.
- It was mentioned that using
big_tensor[vi]
allows for direct indexing into tensors, enabling efficient modifications.
- It was mentioned that using
- Issue with Symbolic Indexing in Tinygrad: A user highlighted a problematic scenario where symbolic indexing into a Tensor fails when utilizing a Variable for indexing.
- They noted that while basic tensor indexing works, the symbolic approach does not function correctly, leading to discussions on the issue on GitHub.
LlamaIndex ▷ #blog (5 messages):
LlamaParse video, LlamaIndex.TS update, Automated Sales Outreach Agent, LLM Consortium implementation, Mistral Saba model release
- LlamaParse Video Unveils Key Features: In a recent video, @mesudarshan detailed LlamaParse, highlighting its multiple parsing modes and abilities like parsing audio and images, and JSON mode.
- LlamaIndex.TS Shrinks for Efficiency: An update has made LlamaIndex.TS significantly smaller and easier to ship, enhancing its usability.
- Discover the changes and improvements at this link.
- Create a Sales Outreach Agent with Ease: A tutorial was shared on how to build an Automated Sales Outreach Agent, which generates outreach emails and schedules meetings based on responses.
- Implementation of LLM Consortium Concept: Massimiliano Pippi has developed an implementation of @karpathy's concept of an LLM Consortium, allowing multiple LLMs to respond to questions and compare answers.
- Mistral Saba Model Day 0 Support: The launch of Mistral Saba, a new Arabic-focused small model by @mistralAI has garnered immediate support from LlamaIndex.
- Users can install it using the command
pip install llama-index-llms-mistralai
, more info available here.
- Users can install it using the command
LlamaIndex ▷ #general (25 messages🔥):
Generative AI Hackathon, SimpleDirectoryReader Usage, Double RAG Workflow, RAG for Excel/CSV Data, Metadata Filtering by Dates
- Generative AI Hackathon Announced: CreatorsCorner, in collaboration with Google Deepmind and others, announced a hackathon with over $50k in prizes, encouraging teams to build multimodal AI agents addressing real-world problems.
- Participants will showcase their solutions through interactive UIs, highlighting their ability to apply advanced reasoning and decision-making.
- Using SimpleDirectoryReader Effectively: Discussions arose on whether it's more efficient to remove unwanted files or utilize the
exclude
andrequired_exts
parameters when using the SimpleDirectoryReader.- Opinions varied, with one participant suggesting that setting
required_exts
could be just as effective without needing to rearrange files.
- Opinions varied, with one participant suggesting that setting
- Double RAG Workflow Discussion: A new user introduced a project to create a double RAG workflow that extracts and summarizes services from documents before fetching category codes.
- They shared a JSON structure for outputs and sought advice on how to automate the process of querying and handling multiple summaries.
- RAG Applications for Excel/CSV Data: An inquiry was made on how to implement RAG for querying Excel/CSV data, especially translating user queries about sales data.
- The user is looking for specific features in LlamaIndex or LangChain for effectively handling structured data retrieval and parsing.
- Date Filtering with Metadata: A user queried if date filters are supported in metadata for vector stores, noting that many do not have this capability.
- It was suggested to separate year data in metadata for more manageable filtering, especially when using the PostgreSQL vector store.
- Qdrant Vector Store - Default Qdrant Filters - LlamaIndex: no description found
- Multimodal AI Agents - 2 day Hackathon · Luma: Gen AI AgentsCreatorsCorner, collaborating with Google Deepmind, Weights & Biases, Together.ai, Stytch, Senso, LlamaIndex, Activeloop and others…
- Microsoft Forms: no description found
- Intuitiveobjects: no description found
- Quambase.ai: no description found
- Quambase.ai: no description found
LlamaIndex ▷ #ai-discussion (4 messages):
Complexity of Custom Prompts, AI Community in India, Quantum Computing Education, RAG based on JSON dictionaries
- Navigating Complex Prompts: A member discussed challenges in getting models to respond accurately when working with complex or custom tasks, particularly if the data diverges from what the model trained on.
- Extensive prompt engineering, including few-shot examples and multi-prompt approaches, might not be enough in such situations.
- India's AI Community Blooms: An announcement was made inviting individuals to join India's rapidly growing AI community for collaboration and innovation.
- The community promotes networking and learning, providing an opportunity for members to connect through WhatsApp.
- Quantum Computing Education on the Rise: A call was made to join India's fully bootstrapped quantum education community, emphasizing access to cutting-edge knowledge and support.
- Members are encouraged to embark on their quantum journey through the provided link on WhatsApp.
- RAG with JSON Dictionaries: A query was raised about effective examples of Retrieval-Augmented Generation (RAG) that rely solely on JSON dictionaries to locate documents matching user queries.
- The discussion indicates a search for methodologies that enhance the usability of large JSON datasets in AI applications.
- Ai - ML - QB: WhatsApp Group Invite
- Quantum-QB: WhatsApp Group Invite
Nomic.ai (GPT4All) ▷ #general (28 messages🔥):
Nomic Embed Text V2, DeepSeek Model Bindings, GPT4All LocalDoc, Code Llama Template Issue, Model Tool Usage
- Nomic Embed Text V2 Source Code Location: A member expressed difficulty finding the Nomic Embed Text V2 source code and training dataset on GitHub, the Nomic blog, and other resources.
- Another member provided links to the GitHub repository and the Hugging Face page to help locate the information.
- Issues with DeepSeek Model Bindings: A user reported errors when loading the DeepSeek-R1-Distill-Qwen-1.5B-Q4_0.gguf model, indicating potential issues with the python bindings being outdated.
- Another member suggested that the problem might stem from using an older version of the llama.cpp library, noting that updates could resolve the conflict.
- Configuring LocalDoc in GPT4All: A user inquired about whether using LocalDoc in GPT4All sends data to the cloud and how to disable this after initially enabling it.
- A response pointed out that users should ensure that the setting for Enable Data Lake is unselected to avoid cloud data sharing.
- Setting Template for Code Llama: A user needed assistance configuring a chat template for the Code Llama model they were using.
- Another user provided a basic example for a message template, emphasizing that the model's specific repository or full name might be required for better support.
- Determining Model Tool Usage: A member asked how to verify whether a model can utilize tools effectively during execution.
- The consensus provided was to check the model developer's documentation, noting that tools should be implemented only if the model was specifically trained on tool calling.
- From Script to Screen (and back): How to prototype your films with Blender and genAI-based add-ons? #b3d #genAI #movie #film https://github.com/tin2tin/Blender_Screenwriterhttps://github.com/...
- GitHub - nomic-ai/nomic: Interact, analyze and structure massive text, image, embedding, audio and video datasets: Interact, analyze and structure massive text, image, embedding, audio and video datasets - nomic-ai/nomic
- nomic-ai/nomic-embed-text-v2-moe · Hugging Face: no description found
LLM Agents (Berkeley MOOC) ▷ #mooc-announcements (2 messages):
Berkeley RDI internships, Marketing Assistant role, Web Developer & Video Editor role
- Exciting Internship Opportunities at Berkeley RDI: The Berkeley Center for Responsible, Decentralized Intelligence (RDI) announced openings for exciting internships for UC Berkeley students, focusing on AI innovations.
- Applications are open and reviewed on a rolling basis; interested candidates should apply quickly to secure their spot.
- Marketing Assistant Internship: The Marketing Assistant will develop marketing strategies, manage social media, and analyze metrics to boost engagement, with a focus on creative content.
- Ideal candidates should be proficient in design tools and have a keen interest in AI and decentralization, and can apply through this link.
- Web Developer & Video Editor Internship: The RDI is seeking independent web developers and video editors to enhance their website and produce multimedia content for their YouTube channel.
- Candidates should be skilled in GitHub and multimedia production, and applications can be sent to samanthaguo@berkeley.edu.
LLM Agents (Berkeley MOOC) ▷ #mooc-questions (5 messages):
Course Enrollment, Quiz Availability, Course Communication, Class Schedule, Joining Late
- Course Enrollment Questions: Students are encouraged to enroll in the course through CalCentral using class numbers 33840 for CS194-280 and 33841 for CS294-280.
- If the class is full, they should fill out the petition form to get on the waitlist.
- Quizzes Are Limited: Currently, only Quizzes 1 and 2 are available for the course, with Quiz 3 expected to be released soon due to scheduling delays.
- An update was shared to inform of the delays caused by no class being held on Monday.
- Using Edstem for Course Queries: Students are advised not to email course staff or TAs and to instead utilize Edstem for any queries.
- For private matters, they should post a private question on Edstem visible to all teaching staff.
- Late Enrollment Accepted: One student inquired if they could join the course late, and it was confirmed that they absolutely can.
- This indicates a level of flexibility for latecomers interested in the course.
Link mentioned: CS294/194-280 Advanced Large Language Model Agents: Spring 2025
LLM Agents (Berkeley MOOC) ▷ #mooc-lecture-discussion (2 messages):
DeepSeek's reasoning method, FinRL-DeepSeek, Algorithmic Trading, GRPO style reasoning, Lecture 3 by Prof Yu Su
- Join the DeepSeek Reasoning Discussion!: A special discussion on DeepSeek's reasoning method and GRPO will take place at 7 PM PT (UTC-8), focusing on integrating GRPO style reasoning into smaller models.
- Participants are invited to join a study group that discusses weekly lectures and an open Q&A session.
- FinRL-DeepSeek Webinar Announcement: The FinRL-DeepSeek webinar will present an extension of CVaR-PPO that incorporates trading recommendations and risk assessments generated by LLMs on February 25.
- Open-source code, data, and AI trading agents will be provided, alongside backtesting results on the Nasdaq-100; however, participation is at your own risk.
- Catch Lecture 3 with Prof Yu Su!: The study group will also cover Lecture 3 by Prof Yu Su tomorrow at 7 PM PT (UTC-8).
- This lecture is part of the ongoing discussions held weekly in the forum.
LLM Agents (Berkeley MOOC) ▷ #mooc-readings-discussion (2 messages):
Quiz 3 availability
- Quiz 3 set for early next week: A member inquired about the availability of Quiz 3.
- Another member confirmed that it will be available early next week.
- User inquiry about Quiz schedule: A member asked for information regarding when the next quiz will be released.
- The question reflects ongoing engagement and anticipation in the course.
DSPy ▷ #general (9 messages🔥):
DSPy code golf, Qwen 0.5B multi-label classification, JSON output configuration, Feedback mechanism for MIpro, Annotation rubric usage
- DSPy Code Golf Sparks Competitive Ideas: Some members discussed engaging in DSPy code golf, suggesting a competitive game around creating concise DSPy solutions and sharing quick hacks.
- A member noted it's a fun excuse to spend some time on neat coding challenges.
- Ensuring JSON Output with Qwen 0.5B: A member inquired about securing JSON outputs for a multi-label classification task using Qwen 0.5B, seeking guidance on necessary configurations.
- Another member recommended using DSPy's JSONAdapter for proper JSON output formatting and provided example code demonstrating its application.
- Custom Implementations with DSPy: A member shared that they have forked and created a custom implementation of Parlant with DSPy support, ensuring working JSON outputs.
- They offered an example run that can be integrated into the Parlant server for optimal performance.
- Deciding Placement for Constant Parameters in Pipelines: Clarifications were sought on whether constant parameters should be included in the signature docstring or as an InputField in the pipeline setup.
- The discussion included considerations for its impact on mipro and bootstrapfewshotrandomsearch optimization processes.
- Feedback Mechanism for MIpro: A member raised a question about passing feedback to MIpro regarding incorrect answers, beyond simply making the number a minor metric.
- The inquiry sought methods for effectively communicating the reasons for incorrect outputs to improve the model's performance.
Link mentioned: Tweet from Omar Khattab (@lateinteraction): Sometimes I look for an excuse to spend some 5 minutes on some neat DSPy golf.Someone was asking: How do I use DSPy for extraction of structured data from HTML? Hmm, but that's a one-liner.What if...
MLOps @Chipro ▷ #events (2 messages):
FinRL-DeepSeek, Algorithmic Trading Webinar
- Webinar Announcement for FinRL-DeepSeek: A webinar titled FinRL-DeepSeek – Algorithmic Trading will be presented, introducing an extension of CVaR-PPO that incorporates trading recommendations and risk assessments from LLMs. Registration is available at this link, scheduled for February 25 at 8PM CST.
- The session promises open-source code, data, trading agents, and backtesting results on the Nasdaq-100, with AI agents ready for deployment at your own risk.
- Request for Webinar Recording: A new member inquired about recordings of the discussed webinar. This highlights growing interest in the materials covered and potential follow-ups for those who couldn't attend.