[AINews] 1/8/2024: The Four Wars of the AI Stack
This is AI News! an MVP of a service that goes thru all AI discords/Twitters/reddits and summarizes what people are talking about, so that you can keep up without the fatigue. Signing up here opts you in to the real thing when we launch it 🔜
Not much happened in the discords today, so time to plug our Latent Space 2023 recap!

The Four Wars of the AI Stack (Dec 2023 Recap)
The Data Wars, The War of the GPU Rich/Poor, The Multimodality War, The RAG/Ops War. Also: our usual highest-signal recap of top items for the AI Engineer from Dec 2023!
Enjoy!
Table of Contents
Nous Research AI Discord Summary
- "Obsidian Project - in short" : In project Obsidian
@qnguyen3
mentioned briefly that they are utilizing DINO, CLIP and CNNs for the project.
- "Cloud-Based LLM Concerns, Addressed": Discussion about the design challenges of cloud-based Large Language Models (LLMs) services captured attention when
@maxwellandrews
shared a research paper that proposed solutions through distributed models like DistAttention and DistKV-LLM.
- "Self-Extending Context Window Triumph":
@kenakafrosty
shared a paper titled 'Self-Extend LLM Context Window Without Tuning', arguing that existing LLMs are inherently capable of handling long context situations, sharing the paper's link and relevant Twitter and GitHub discussions.
- "Your Morning Coffee, Brought by AI?": Users tossed jokes and musings about a tweet that
@adjectiveallison
shared regarding an AI robot called 'Figure-01' that claims to have learned to make coffee after observing humans. The conversation expanded into comparing the project to another AI program, ALOHA, shared by@leontello
.
- "LLMs that Learn and Teach":
@deki04
shared a link to a GitHub repository with a comprehensive course on Large Language Models (LLMs), sparking a discussion about model improvements and their practical applications, led by@leontello
and@vincentweisser
.
- "To Embed, or Not to Embed":
@gabriel_syme
suggested that hierarchical embeddings may still be a necessary addition to the OAI model, despite not showing an expected boost in performance.
- "The Agentic RAG Trend":
@n8programs
announced plans to experiment with agentic RAG, a model that generates search queries based on input and collects data until enough has been accumulated.
- "AI Engineer's Guide to LLM Fine-Tuning":
@realsedlyf
requested insight on the best methods for creating synthetic data required to fine-tune a language model for a specific domain.
- "Oil & Gas Industry Embraces LLM Analysis":
@kapnap_n
detailed the application of LLMs for an unusual domain - analyzing downhole wellbore data in the oil & gas industry.
- "AgentSearch Dataset Launch!": The newly-released AgentSearch-V1 dataset was promoted by
@teknium
who shared a link to a tweet by@ocolegro
, announcing the availability of one billion embedding vectors encompassing Wikipedia, Arxiv, and more.
- "LLM Talk - Exposés and Suggestions": The '#ask-about-llms' channel saw several captivating debates about different LLM facets like KV_Cache implementation, comparison between MoE and Mistral, and performance differences between TinyLLAM and Lite LLAMAS.
- "Peeking into the Silver Lining": The notion that smaller models may hold more processing capabilities than they seem was introduced by
@kenakafrosty
leading to conversations about the saturation point of smaller models.
- "Exploring Implementations of Merging Technology": Queries about notebooks for MoE (mixture of experts) implementations and restraints of PointCloud models led to insightful exchanges between
@teknium
and.beowulfbr
, with a Mergekit GitHub link shared for reference.
- "Mixtral Favored for its Roomy Context Window":
@gabriel_syme
and@teknium
expressed preference for the Mixtral model. Despite the availability of other models like Mistral and Marcoroni, Mixtral's larger context window of 32k was considered a standout advantage.
Nous Research AI Channel Summaries
▷ #ctx-length-research (2 messages):
- Exploring DistAttention for Cloud-Based LLM Services: User
@maxwellandrews
shared a link to a research paper on DistAttention and DistKV-LLM, new distributed models that aim to alleviate the challenges of designing cloud-based Large Language Models (LLMs) services. The link leads to an abstract discussing how these models can dynamically manage Key-Value Cache and orchestrate all accessible GPUs.
- LLMs and Long Context Situations: User
@kenakafrosty
shared a link to a research paper entitled 'Self-Extend LLM Context Window Without Tuning'. The paper argues existing LLMs have the inherent ability to handle long contexts without fine-tuning training sequences.
- Practical Applications of the Self-Extend Model:
@kenakafrosty
noted that the 'Self-Extend LLM Context Window Without Tuning' concept is being implemented with seemingly good results, sharing relevant Twitter and GitHub links.
Links mentioned:
- Paper page - Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache
- LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning: This work elicits LLMs' inherent ability to handle long contexts without fine-tuning. The limited length of the training sequence during training may limit the application of Large Language Models...
- GitHub - datamllab/LongLM: LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning: LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning - GitHub - datamllab/LongLM: LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
▷ #off-topic (5 messages):
- Is Coffee-Making AI a Big Deal?:
@adjectiveallison
shared a tweet from@adcock_brett
about an AI robot called 'Figure-01', claiming it had learned to make coffee by watching humans, and mentioned this is an end-to-end AI with video in, trajectories out. This was followed by skepticism and humor from@teknium
and@gabriel_syme
, questioning if the capability to make coffee is remarkable news.
- Comparing AI Complexity:
@leontello
compared the coffee-making AI to another project, ALOHA (linked to a tweet by@tonyzzhao
), calling it quite lackluster, but later clarified their comment, recognizing the difference in contexts involving self-driving vs. robotic hardware setups.
- Spotting Coincidences in AI Robotics: In a humorous twist,
@adjectiveallison
shared another tweet, this time from@atroyn
, noticing that the coffee machine used by the coffee-making AI looked very familiar, being previously seen in a video from Chelsea Finn's research project.
Links mentioned:
- Tweet from anton (𝔴𝔞𝔯𝔱𝔦𝔪𝔢) (@atroyn): something about this demo video seemed very familiar, then i realized i had seen that same coffee machine before in one of @chelseabfinn's video from her group's paper https://lucys0.github....
- Tweet from Brett Adcock (@adcock_brett): Figure-01 has learned to make coffee ☕️ Our AI learned this after watching humans make coffee This is end-to-end AI: our neural networks are taking video in, trajectories out Join us to train our r...
▷ #interesting-links (5 messages):
- Resource for LLM Course shared:
@deki04
shared a link to a GitHub repository offering a comprehensive course on Large Language Models with chatbot roadmaps and Colab notebooks. - Mixed Opinions on Model Improvement:
@leontello
speculated on the feasibility of the introduction of augmented models, suggesting a potential increase in parameter counts which could pose practicality concerns. - Deep Dive into LLM Agents:
@vincentweisser
shared an article detailing a comprehensive analysis on LLM agents, ChatGPT and GitHub Copilot underlining their significant impact yet existing limitations in complex tasks due to a restricted context window. - LeCun's LLM research feature:
@vincentweisser
also highlighted a research paper related to LLM by Yann LeCun.
Links mentioned:
- Thoughts on LLM Agents: Entropy, criticality, and complexity classes of cellular automata.
- GitHub - mlabonne/llm-course: Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.: Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks. - GitHub - mlabonne/llm-course: Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
▷ #general (79 messages🔥🔥):
- Exploring Performance of Hierarchical Embeddings:
@gabriel_syme
noted that hierarchical embeddings might still be needed in addition to the implemented OAI model, hinting at a lack of expected improvement in performance. - Handling Synthetic Data for LLM Fine-Tuning:
@realsedlyf
inquired about the best current methods for creating synthetic data for language model fine-tuning in a specific domain. - Agentic RAG Experimentation:
@n8programs
announced plans to experiment with agentic RAG, an approach where the model generates various search queries based on an input question and collects information until a sufficient amount has been gathered. They cited Mistral as being particularly good for such tasks. - Industry Application - LLM and Wellbore Analysis:
@kapnap_n
shared their approach to using language model to analyze downhole wellbore data in oil & gas industry. They also discussed how the data is represented and the potential benefits of this approach, sparking interest from other users like@julianotto
and@everyoneisgross
. - AgentSearch Dataset Announcement:
@teknium
shared a link to@ocolegro
's tweet about the release of the AgentSearch-V1 dataset, consisting of over one billion embedding vectors covering Wikipedia, Arxiv, filtered common crawl, and more.
Links mentioned:
Tweet from Owen Colegrove (@ocolegro): The full dataset for AgentSearch-V1 is now available on HF!! Recommended: @qdrant_engine - for indexing and search @nomic_ai - for visualization I'm looking to expand what is indexed - agent spe...
▷ #ask-about-llms (54 messages🔥):
- Request for KV_Cache implementation:
@lwasinam
asked for any links to implementations of KV_Cache for confirmation purposes. - Considering MoE for comparison with Mistral:
@bernaferrari
suggested making a mixture of experts (MoE) out of phi and compare it to Mistral. - TinyLLAM vs Lite LLAMAS: In a discussion between
@gabriel_syme
and@teknium
, it was noted that TinyLLAM underperforms, leading to a decision to switch to Lite LLAMAS. - Processing capabilities of smaller models:
@kenakafrosty
sparked a discussion on the notion that smaller models (6-20B range) might actually have more processing capability than appears at first glance, with the gap lying in instruction following.@teknium
shared his opinion that 7B models are reaching their saturation point but also added that saturation point scales non-linearly with model size. - Mergekit and MOE demonstration:
@teknium
inquired about any existing notebooks for Mergekit MOE (mixture of experts) implementation. In response,.beowulfbr
shared a link to the Mixtral branch of Mergekit on GitHub for reference. - PointCloud model limitations: In a conversation started by
@gabriel_syme
about PointCloud models,@teknium
explained that if the base model supports 8k, it should be able to do 8k inputs but will not produce more than 4k outputs. - Preference for Mixtral with larger context window:
@gabriel_syme
and@teknium
discussed various models, including Mistral, Marcoroni, and Mixtral. They expressed a preference for Mixtral, given its larger context window of 32k.
Links mentioned:
GitHub - cg123/mergekit: Tools for merging pretrained large language models.: Tools for merging pretrained large language models. - GitHub - cg123/mergekit: Tools for merging pretrained large language models.
▷ #project-obsidian (1 messages):
qnguyen3: DINO, CLIP and CNNs for now
Eleuther Discord Summary
- Discussing LAION's Decay:
@stellaathena
and@flow7450
considered the decay rate of datasets like LAION. Notably, LAION400 experienced approximately 20% failure in download after about 2-2.5 years according to a paper cited by@flow7450
. - Duel Over Duplicate Data:
@uwu1468548483828484
and@flow7450
debated the importance of duplicate data, weighing the merits of backups versus the need for unique samples. - ELO-Ribbing Models:
@letterrip
proposed a project for creating an ELO rating for each question in models to form a testing benchmark subset for training runs. The discussion also touched on where to propose new projects. - The Axolotl DPO Conundrum:
@karkomagor
asked if axolotl supports fine-tuning using DPO datasets, with@main.ai
suggesting to ask on the Axolotl server instead. - T5 Breakdown:
@stellaathena
confirmed T5 as an encoder-decoder in response to a question from@ricklius
. - Rattling the Learning Rate Cage: DeepSeek AI models' unusual stepwise decay learning rate schedule was examined by
@maxmatical
and@ad8e
. This led@ad8e
to propose testing swift final stretches at 0.1xLR, potentially negating the need for a constant decay. - Twisting Transformer Layers:
@kram1032
suggested permuting layers in transformer architectures during training, hypothesizing that this might encourage more reliance on skip connections and lead to robust networks even when adding or removing layers. - Seeking MoE-mentous Scaling Laws:
@bshlgrs
sought cutting-edge literature on LM Scaling Laws specific to Mixture Of Experts (MoE) models, with contributions from@philpax
and@main.ai
. - The Harness Snags: From evaluating models like MMLU to implementing custom datasets and even considering adding a toxicity/bias grader, various functionalities of the
lm-eval-harness
were discussed by@gson_arlo
,@hyperion.ai
,@ishigami6465
, and@.johnnysands
. The importance of speculative decoding was emphasized by@stellaathena
and@hailey_schoelkopf
.
Eleuther Channel Summaries
▷ #general (29 messages🔥):
- Decaying Datasets: Pondering LAION's Longevity:
@stellaathena
initiated a discussion about linkrot, specifically addressing the decay rate of datasets like LAION.@flow7450
examined LAION400's decay, discovering that approximately 20% failed to download after about 2-2.5 years. - Duplicates: A Debate on Backup Strategy vs. Uniqueness:
@uwu1468548483828484
and@flow7450
had a debate on the importance of duplicate data, arguing whether backups outweigh the need for unique samples. - A New Project Proposal on Model ELO Ranking:
@letterrip
proposed a new project, suggesting creating an ELO rating for each question in current models to form a subset of benchmarks for testing during training runs. There was discussion about where to propose new projects, with@flow7450
suggesting<#1102787157866852402>
and@ad8e
clarifying that community-projects is mainly for projects that one intends to drive rather than just ideas.@letterrip
confirmed interest in driving the project. - Axolotl and DPO datasets:
@karkomagor
inquired if axolotl supports fine-tuning using DPO datasets, and@main.ai
recommended asking this in the Axolotl server instead. - Inquiries about Managing Multiple Server Activities:
@seon5448
opened a discussion about keeping track of activities across various servers, and was seeking advice on management tactics for multiple reading groups and project events. Suggestions for management tools or techniques were not mentioned in response.
▷ #research (33 messages🔥):
- T5 Encoder-Decoder Clarification:
@ricklius
inquired if T5 was an encoder-decoder to which@stellaathena
confirmed it is, and mentioned that sometimes people detach the encoder from the encoder-decoder for usage.
- DeepSeek AI Models' Learning Rate Schedules:
@maxmatical
discussed DeepSeek AI models, particularly an unusual stepwise decay learning rate schedule they used that had the same final loss as the traditional cosine decay. While this method allows for more flexible use of checkpoints during pre-training,@ad8e
highlighted the potential dangers of large learning rate steps, pointing out that misuse of learning rates could lead to suboptimal outcomes or even divergence in model training.
- Potential Model Training Experiment:
@ad8e
revealed an intention to test the idea behind the above-discussed learning rate steps. The intention was to see if a swift final stretch at 0.1xLR was all that's needed, possibly negating the need for a constant decay.
- Discussion on Weight Decay and Gaussian Weight Noise:
@fessus
brought up the subject of the possible effects of combining gaussian weight noise and weight decay in a network that doesn't have learned affines in normalization layers. They reported potential benefits in terms of pruning unnecessary network complexity in toy datasets.
- Transformers with Permuted Layers Idea:
@kram1032
proposed a unique idea of permuting layers during training within a transformer architecture with constant layer size. Their hypothesis is that this approach may encourage the network to rely more on skip connections and could lead to more robust networks when it comes to adding or removing layers.
Links mentioned:
- Chess-GPT’s Internal World Model: A Chess-GPT Linear Emergent World Representation
- DeepSeek LLM: Scaling Open-Source Language Models with Longtermism: The rapid development of open-source large language models (LLMs) has been truly remarkable. However, the scaling law described in previous literature presents varying conclusions, which casts a dark ...
- ad8e: Weights & Biases, developer tools for machine learning
▷ #scaling-laws (5 messages):
- Looking for Reading Recommendations on LM Scaling Laws:
@bshlgrs
sought recommendations for current state-of-the-art literature on Language Model (LM) scaling laws, specifically for Mixture of Experts (MoE) models. They specifically mentioned this paper which recommends MoE use only for LMs with less than 1 billion parameters, a claim appearing to be contested by practitioners. - Suggestion for LM Scaling Laws Paper:
@philpax
highlighted a recent paper on LM scaling laws. Although this does not specifically address MoE models, it could offer relevant insights. - 'Smaller and Longer' is the Key for Large Inference Demand:
@bshlgrs
highlighted a key finding from the suggested paper, suggesting that for Language Model Manufacturers (LLM) with large inference demand (around 1 billion requests), the best strategy is to train smaller models for a longer duration. - Lack of High-Compute Budget Scaling Papers for MoE:
@main.ai
pointed out the lack of papers addressing high compute budget scaling for MoE models.
▷ #lm-thunderdome (16 messages🔥):
- Understanding lm-eval-harness Functions:
@gson_arlo
asked howlm-eval-harness
works with evaluation on datasets like MMLU (4-choice mcqa).@baber_
confirmed thatoutput_type: generate_until
triggers a model's inference once, whereasoutput_type: log_prob
calculates the likelihood four times, once for each probable completion. - Flexible Postprocessing for lm-eval-harness:
@hyperion.ai
suggested enhancinglm-eval-harness
with a loose and flexible post-processing factor, aligning closer to real-world practices where the answer output can be flexible yet correct.@stellaathena
confirmed that the harness can handle such situations. - Implementing Custom Datasets in lm-eval-harness:
@ishigami6465
inquired about the specific format of datasets needed forlm-eval-harness
.@hailey_schoelkopf
clarified that the user can define this in a task's configuration and explained how the configuration could work for different types of tasks. - Potential for Toxigen Grader in lm-eval-harness:
@.johnnysands
brought up the idea of adding a toxicity/bias grader to lm-eval-harness, considering tools like LlamaGuard offer this functionality already.@hailey_schoelkopf
affirmed that such grader models could be integrated, particularly if locally deployed to avoid disrupting the main evaluation model. - Considerations for Speculative Decoding:
@stellaathena
highlighted the importance of speculative decoding, and@hailey_schoelkopf
suggested that an inference library should handle this externally from lm-eval. Both believe the Hugging Face's TGI and tensorrt-llm currently manage this well.
OpenAI Discord Summary
- Europe's Elderly Shift: In the #prompt-engineering channel,
mysterious_guava_93336
andChatGPT
explored the demographic transition from the "European baby boom" of the last century to today's "elderly Europe". The AI clarified that signs of fertility contraction started appearing in the late 1960s due to various factors, leading to an aging European population by the late 20th and early 21st centuries. This conversation was later echoed in the #api-discussions channel, reinforcing the shift's timeline and the factors uniting the demographic change.
- Are PowerPoint Presentations Ruining AI?: In #ai-discussions, an ongoing topic was about
ChatGPT
recently providing verbose, "PowerPoint-like" responses instead of concise, natural conversations. Despite advice from@eskcanta
on refining system instructions to generate desired responses,@mysterious_guava_93336
reported no significant improvement. The discussion also addressed how to useChatGPT
on Discord.
- Staying in Your Domain: Dealing with domain verification and GPT editor issues was a hot topic in #gpt-4-discussions. While
@darthgustav
attempted to guide@anardude
through domain verification,@anardude
struggled with the solution and sought OpenAI Support's help.@.marcolp
expressed frustration at a persistent error disallowing access to the GPT editor until resolved.
- When Training Hours Go to Waste: Despair echoed in #gpt-4-discussions as
@moonlit_shadows
expressed disappointment in a 20-hour GPT-2 training session that ended unproductively.
- Language-specific GPT Questions and Rule Sets:
@codocoderson
queried about creating a GPT in English and having its descriptions and starter questions shown in the user's language worldwide in #gpt-4-discussions.@jesuisjes
inquired if their GPT occasionally missed following an outlined process for strategy-making conversations.
- Why Can't GPT Read my Files?: The #gpt-4-discussions channel saw a discussion by
@cerebrocortex
, wondering why GPTs sometimes have issues reading .docx and .txt files. The issue was admittedly uncertain, with possible reasons being document size, token limits, or file corruption during /mnt timeouts.
OpenAI Channel Summaries
▷ #ai-discussions (30 messages🔥):
- Concerns about recent updates to ChatGPT:
-
@mysterious_guava_93336
expressed dissatisfaction with recent updates to ChatGPT, stating that the AI now provides verbose, structured "PowerPoint" style responses instead of the concise, natural conversation style it used to have before the summer of 2023. They shared an example of the type of answer they prefer and asked for advice on how to instruct the AI to generate such responses. - Proposed Instructions for Desired Responses:
-
@eskcanta
suggested refining the system's custom instructions to be more specific and positive, using guidance techniques similar to dog training. They proposed using a pattern for opinions and encouraging the AI to challenge the user creatively. An example of this approach can be seen on OpenAI's chat. - Disappointment with Proposed Changes:
- Despite changing the instructions,
@mysterious_guava_93336
noticed no major improvement, stating that the AI still generated "PowerPoint-like" outputs. - Using ChatGPT on Discord:
-
@lemon77_ps
inquired about how to use ChatGPT on Discord, and@7877
explained that ChatGPT must be used via OpenAI's website. - Interaction on the Discord Server:
-
@michael_6138_97508
pointed out that the discord server is meant for interactions with real people, or equivalents, also mentioning the existence of a bot with specific knowledge of OpenAI API and documentation.
▷ #gpt-4-discussions (26 messages🔥):
- Domain Verification Woes: User
@anardude
asked how to verify his domain.@darthgustav
provided instructions for verifying the domain via DNS records in the GPT editor but@anardude
reported the solution didn't work and asked about how to contact OpenAI Support, to which@darthgustav
suggested Googling "OpenAI Help and Support". - Long Hours of GPT-2 Training in Vain?: User
@moonlit_shadows
shared their disappointment in a 20-hour training session that seemed to end unproductively due to an attribute error during saving. - Inquiries about Creating Language-Specific GPTs: User
@codocoderson
asked about publishing a GPT in English and inquired whether its descriptions and starter questions will be shown in the user's language worldwide. - Issues with GPT Obeying Rulesets:
@jesuisjes
sought to confirm expectations about their GPT occasionally missing processes, despite setting up rules to follow an outlined process for strategy-making conversations. - Concerns on GPT's Issues with Reading Document Files:
@cerebrocortex
asked why GPTs sometimes have problems reading .Docx & .txt files. User@michael_6138_97508
made an educated guess about document size and token limits being potential issues, and@darthgustav
suggested /mnt timeouts during updates causing file corruption. - Troubles with GPT Editor:
@.marcolp
expressed frustration over a persistent "error searching knowledge" problem, leading to an inability to even access the GPT editor, rendering further development of GPTs potentially useless until a fix is in place.@darthgustav
offered a potential workaround involving removing and reattaching knowledge.
▷ #prompt-engineering (1 messages):
- European Demographics Evolution Discussed: User
mysterious_guava_93336
initiated a conversation withChatGPT
about the transition from the "European baby boom" of last century to today's "elderly Europe". - Fertility Contraction Timeline:
ChatGPT
clarified that the first signs of a fertility contraction in Europe began emerging in the late 1960s and became significantly pronounced in the 1970s. - Contributing Factors to Fertility Decline: The reasons cited for this shift included economic changes, women's rights and workforce participation, increased access to contraception and family planning, and cultural shifts.
- Resulting Aging Population: By the late 20th and early 21st centuries, many European countries were experiencing birth rates below the replacement level of 2.1 children per woman, leading to an aging population.
▷ #api-discussions (1 messages):
- The Transition from 'Baby Boom' to 'Elderly Europe':
mysterious_guava_93336
engaged ChatGPT in a discussion about when the first signs of a fertility contraction began in Europe after the post-World War II "baby boom". ChatGPT confirmed that this demographic transition started to appear in the late 1960s and became more pronounced in the 1970s, with various factors contributing, including economic changes, women's workforce participation, access to contraception, and cultural shifts.
Perplexity AI Discord Summary
- File Upload Features on Mobile: According to
@righthandofdoom
and@giddz
, it's currently possible to only upload images from the Perplexity mobile app on iOS. Support for Android is reported to be on the horizon. - Locating the Writing Mode Feature:
@ellestar_52679
queried about the location of the Writing Mode feature, and@icelavaman
highlighted the navigation pathway: click "Focus", then "writing". - Detailed Discussion on Features and Promotions:
@debian3
queried about file upload features and available promotions for the yearly plan.@icelavaman
clarified that savings could be achieved through referral links instead of promotions and redirected to a FAQ link for more details on file upload. - Billing Issues with Perplexity:
@ahmed7089
was upset about getting charged by Perplexity despite deleting their account. The situation was addressed by@mares1317
, who provided a relevant link. - Perplexity Access Restrictions:
@byerk_enjoyer_sociology_enjoyer
voiced concerns about Perplexity's inability to access posts on Pinterest, Instagram, or Tumblr. - In-depth Perplexity vs Pplx Model Comparison Analysis:
@dw_0901
initiated discussions about differences among pplx online models (7B/70B) and Perplexity, questioning the differences in product design. - Contrasting Perplexity's Copilot and Normal version:
@promoweb.2024
enquired about Perplexity's Copilot and normal version differences. Detailed information about Perplexity Copilot was shared by@icelavaman
at this link. - Troubleshoot using $5 Pro Credits on pplx-api:
@blackwhitegrey
was guided to the application process for Pro credits onpplx-api
by@mares1317
, who also provided a step-by-step guide. - Clarity on Perplexity API User Friendliness:
@blackwhitegrey
and@brknclock1215
struggled with a perceived lack of user-friendliness in Perplexity API, primarily due to a lack of coding skills.icelavaman
clarified that the API is primarily meant for developers. - Clarification on Pro Credits as an Extra Payment:
@blackwhitegrey
initially misconstrued Pro credits as an additional payment for API access.icelavaman
clarified that these are actually bonuses provided to developers. - Optimism for Non-Technical Users: Despite the struggles,
@brknclock1215
ended with an optimistic view that people, who aren't necessarily coders but understand technology, could benefit the most from its progression. - Help requested to make a thread public:
@me.lk
advised<@1018532617479532608>
to make their thread public so others can view their content.<@1018532617479532608>
followed this advice and made the thread publicly accessible. - Sharing Perplexity.AI Searches: Searches on how to use and how to draw were shared by
@soanseng
and@debian3
respectively, spreading their knowledge with the community.
Perplexity AI Channel Summaries
▷ #general (23 messages🔥):
- File Upload Functionality In Mobile App:
@righthandofdoom
asked about the ability to upload files from the mobile app.@giddz
clarified that it's currently possible only for images and is only available on iOS, with Android support coming soon. - Finding the "writing mode" feature:
@ellestar_52679
was trying to locate the writing mode feature.@icelavaman
advised them to click "Focus", then "writing". - Questions about Perplexity's functionality and promotions:
@debian3
asked about the purpose of the file upload feature, the types of files that can be uploaded and if any promotions were available for the yearly plan.@icelavaman
assured that while there was no promo, saving could be achieved through referral links. They also provided a link for the queries on file uploads. - Issues with Account Deletion and Billing:
@ahmed7089
complained about being billed by Perplexity even after deleting their account.@mares1317
provided a link in response, presumably containing more information. - Perplexity and Social Media Platforms:
@byerk_enjoyer_sociology_enjoyer
raised a concern about Perplexity's inability to access Pinterest, Instagram, or Tumblr posts. - Perplexity and Pplx Model Comparison:
@dw_0901
, a consultant, asked about the differences between pplx online models (7B/70B) and Perplexity, questioning if there were differences in the underlying product design. - Copilot vs Normal version of Perplexity:
@promoweb.2024
enquired about the differences between using Perplexity's Copilot and the normal version.@icelavaman
shared a link providing a detailed overview of Perplexity Copilot.
Links mentioned:
- What is Perplexity Copilot?: Explore Perplexity's blog for articles, announcements, product updates, and tips to optimize your experience. Stay informed and make the most of Perplexity.
- How does File Upload work?: Explore Perplexity's blog for articles, announcements, product updates, and tips to optimize your experience. Stay informed and make the most of Perplexity.
- What is Search Focus?: Explore Perplexity's blog for articles, announcements, product updates, and tips to optimize your experience. Stay informed and make the most of Perplexity.
▷ #sharing (4 messages):
- Sharing made public: User
@me.lk
advised<@1018532617479532608>
to make their thread public so others can see it, to which<@1018532617479532608>
responded that they've made their thread public now. - Perplexity.AI Searches: Users
@soanseng
and@debian3
shared perplexity.ai searches: -@soanseng
: shared a link on how to use -@debian3
: shared a link on how to draw
▷ #pplx-api (12 messages🔥):
- How to use $5 Pro Credits: User
@blackwhitegrey
sought advice on using the Pro credits onpplx-api
.@mares1317
provided a link to Perplexity API's Getting Started Guide and highlighted the steps including providing payment info, purchasing credits, and generating an API key. - Mismatched API knowledge and needs:
@blackwhitegrey
expressed frustration due to their lack of coding skills and perceived the API as not user-friendly.icelavaman
clarified that the API is primarily intended for developers and not for direct usage on websites. - Pro Credits understood as an extra payment:
@blackwhitegrey
initially assumed Pro users had to pay an extra $5 for the API access. However,icelavaman
clarified that these are not extra payments but bonuses for developers. - Practicality of using the API for Non-Developers:
@brknclock1215
echoed@blackwhitegrey
's sentiments, expressing similar difficulties in implementing the API due to the lack coding skills. They also reasoned that trying to use advanced tools without the appropriate technical knowledge can be more time-consuming than beneficial. - Non-Technical users can still benefit:
@brknclock1215
ended on an optimistic note, insinuating that people who are not necessarily coders, but who understand how to interact with technology, could potentially benefit the most from its progression.
Links mentioned:
Mistral Discord Summary
- Priming the LLM Response: In a conversation with
@bdambrosio
,@i_am_dom
clarified that priming the Latent Language Model (LLM) for an appropriate response is indeed feasible, but the message set must end with a user message. This douses hopes for partially pre-writing responses with the official API as it would return an error. - On the Hunt for a Solid Chat Conversation Program:
@jb_5579
posed a question to the community seeking recommendations on repositories that provide a robust chat conversation program, ideally optimized for the Mistral API, and featuring memory session alongside code-assist and code-completion. - The Case of Unknown Context Window Sizes:
@tonyaichamp
's inquiry about the context window sizes for different versions of API models was met with uncertainty by@frosty04212
, emphasizing on the need for experimental explorations to better understand and harness the system. - Mistral-tiny Proves Its Mettle:
@tonyaichamp
shared noteworthy success with themistral-tiny
model, leveraging it to extract content from a 16k token HTML page. Given its cost-effectiveness and speed, the user intends to apply it for similar tasks in the future. - Ai Agents Assemble:
@10anant10
announced their project centered around building AI agents, to which@.tanuj.
expressed interest and initiated direct communication. - Framework Endorsement: User
@joselolol
recommended exploring the MLX framework. - Guardrailing Guide: A useful resource on guardrailing is linked by
@akshay_1
with a URL:https://docs.mistral.ai/platform/guardrailing/
- Hardware Limitations for Fine-tuning:
@david78901
raised the feasibility of fine-tuning the Mistral 7b on a single 3090. Tuning a LoRA or QLoRA could be managed, but full fine-tuning would likely need multiple 3090s or a single A100 withAxolotl
. - LLMcord - The Versatile Discord Bot:
@jakobdylanc
showcased LLMcord, an open-source Discord bot compatible with Mistral API and personal hardware-run Mistral models via LM Studio. The project available on GitHub also scored mention. - Superiority of Mistral Over OpenAI:
@joselolol
acknowledged Mistral's edge over OpenAI, recognizing it as faster, cheaper, and more effective for tasks. - Brains Baffled by Bicameral Mind: A mention of the Bicameral Mind theory stirred a debate, with
@cognitivetech
recommending the cornerstone book by Julian Jaynes and a skeptical@king_sleeze
equating the theory to pseudoscience. - Mistral API Feature Suggestion:
@jakobdylanc
favored an enhancement to the Mistral API to handle edge cases of an empty content list, much like the OpenAI API presently does. - Functionality Expansion on the Horizon: Function calling in Mistral is set to be a priority, as indicated by
@tom_lrd
.
Mistral Channel Summaries
▷ #general (9 messages🔥):
- Prime the Response:
@bdambrosio
discussed the technique of using a final assistant message to 'prime' the LLM for the appropriate response, and was seeking advice on its effectiveness when the server rejects a message set that ends with an assistant message.@i_am_dom
clarified that response priming can be done, but it needs to end with a user message. They also mentioned that it is not possible to partially pre-write the response itself with the official API, as it would return an error. - Favorite repo for chat conversation:
@jb_5579
asked the community for their favorite repositories for a solid chat conversation program - specifically one that is optimized for the Mistral API and features session memory, with a focus on Code Assist and Code Completion. - Context Window Sizes:
@tonyaichamp
inquired about the context window sizes for different versions of API models, but@frosty04212
responded that the sizes are not currently known. They urged for experimentation to understand and leverage the system better. - Quality of Mistral-tiny for extracting content:
@tonyaichamp
shared a positive experience using themistral-tiny
model for extracting content from a 16k token HTML page. Given the model's cost-effectiveness and speed,@tonyaichamp
intends to use it for similar tasks in the future.
▷ #models (2 messages):
- Building AI Agents: User
@10anant10
announced that they are working on building AI agents. - Direct Communication Initiated: User
@.tanuj.
responded to@10anant10
's comment, stating their intention to shoot them a direct message.
▷ #deployment (1 messages):
joselolol.: Hello good sir, consider using the MLX framework!
▷ #ref-implem (1 messages):
akshay_1: https://docs.mistral.ai/platform/guardrailing/
▷ #finetuning (2 messages):
- Feasibility of fine-tuning on a single 3090: User
@david78901
mentioned that a single 3090 can possibly handle tuning a LoRA or QLoRA on Mistral 7b, but full fine-tuning is only feasible with 3x3090s or a single A100 using Axolotl.
▷ #showcase (4 messages):
- Introducing LLMcord, a Versatile Discord Bot:
@jakobdylanc
presented his open-source Discord bot, LLMcord, which supports both Mistral API and running Mistral models on personal hardware via LM Studio. Features include a sophisticated chat system, compatibility with OpenAI API, streamed responses, and concise code contained in a single Python file. One can check out the project on GitHub. - Mistral Powers AI Backend:
@joselolol
mentioned that they're using Mistral to support the backend of certain AI tasks. - Synthetic Data Generation and Model Evaluation:
@joselolol
also shared that his system can generate synthetic data and provide evaluation for fine-tuned models, a potentially useful tool for developers. - Mistral vs OpenAI: In
@joselolol
's experience, Mistral outpaces OpenAI in most tasks, proving faster, cheaper, and more effective.
Links mentioned:
- 👾 LM Studio - Discover and run local LLMs): Find, download, and experiment with local LLMs
- GitHub - jakobdylanc/llmcord: A Discord AI chat bot | Choose your LLM | GPT-4 Turbo with vision | Mixtral 8X7B | OpenAI API | Mistral API | LM Studio | Streamed responses | And more 🔥: A Discord AI chat bot | Choose your LLM | GPT-4 Turbo with vision | Mixtral 8X7B | OpenAI API | Mistral API | LM Studio | Streamed responses | And more 🔥 - GitHub - jakobdylanc/llmcord: A Discord A.....
▷ #random (5 messages):
- Bicameral Mind Theory Sparked Interest: User
@blueridanus
appeared puzzled about something, which was soon clarified by@cognitivetech
with a link to the Wikipedia page that discussed The Origin of Consciousness in the Breakdown of the Bicameral Mind. It's a 1976 publication by author Julian Jaynes that presents a theory on the origin of human consciousness. - Book Recommendation and Contention: The same user
@cognitivetech
then highly recommended the book and highlighted its thought-provoking nature. In response,@king_sleeze
expressed skepticism, arguing that Jaynes’s theory is based on circumstantial evidence, equating it to pseudoscience. - Skepticism Over Understanding Consciousness: In another message,
@king_sleeze
pointed out the complexity of understanding human consciousness, drawing a comparison to the 'black box' nature of neural networks. They stated, "no human I know of could tell me where or how their thoughts formed", highlighting the mysterious nature of human thought formation.
Links mentioned:
The Origin of Consciousness in the Breakdown of the Bicameral Mind - Wikipedia
▷ #la-plateforme (3 messages):
- Discussion on desired functions:
@gbourdin
mentioned that the community is eagerly waiting for a certain function possibility. - Mistral API enhancement suggestion:
@jakobdylanc
suggested that the Mistral API should be able to handle the edge case of message.content as an empty list just like the OpenAI API does currently. - Function calling is on the horizon:
@tom_lrd
mentioned that function calling has been announced as a priority for future development.
DiscoResearch Discord Summary
Only 1 channel had activity, so no need to summarize...
- German DPR Dataset from Wikipedia:
@philipmay
shared his project for creating a German Dense Passage Retrieval (DPR) dataset based on the German Wikipedia. He posted the project on GitHub for public use.
- Debate on Contextual Length: A discussion emerged about the appropriate length of document context for embedding.
@sebastian.bodza
questioned whether@philipmay's
use of a maximum token count of 270 in his project was too short, and compared it to Jina Embeddings and other models that were trained on as many as 512 tokens.@philipmay
and@bjoernp
argued that longer contexts may become distracting or be more difficult for a BERT model to encode.
- BAAI Training Data Suggestion:
@sebastian.bodza
shared a link to BGE's training data hosted on HuggingFace suggesting it might provide additional insights.
- E5's Training on 512 Tokens:
@sebastian.bodza
noted that the E5 model was also trained on 512 tokens, further supporting the debate on the optimal contextual length. Detailed information about E5's training can be found here.
Links mentioned:
- Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents: Text embedding models have emerged as powerful tools for transforming sentences into fixed-sized feature vectors that encapsulate semantic information. While these models are essential for tasks like ...
- Text Embeddings by Weakly-Supervised Contrastive Pre-training: This paper presents E5, a family of state-of-the-art text embeddings that transfer well to a wide range of tasks. The model is trained in a contrastive manner with weak supervision signals from our cu...
- GitHub - telekom/wikipedia-22-12-de-dpr: German dataset for DPR model training: German dataset for DPR model training. Contribute to telekom/wikipedia-22-12-de-dpr development by creating an account on GitHub.
- BAAI/bge-large-en-v1.5 · Hugging Face
- Benchmarking Evaluation of LLM Retrieval Augmented Generation: Learn about what retrieval approaches work and chunking strategy. Includes test scripts and examples to parameterize retrieval on your own docs, determine performance with LLM evaluations and provide ...
Latent Space Discord Summary
Only 1 channel had activity, so no need to summarize...
- Exploring the Role of AI as an Editor: User
@slono
questioned if there are AI writing tools that perform as an editor, helping craft messages, reorganize structure, suggest or eliminate sentences.@coffeebean6887
suggested that describing what you want to a custom GPT for a few minutes can achieve this. However, slono pointed out that it's not an ideal solution due to its cumbersome user interface. - Struggles with AI-Assisted Editing: Responding to the above thread,
@swizec
shared their experience of implementing such a feature in swiz-cms. They remarked that the biggest challenge was effectively communicating what the AI should look for, implying the need for improved guidance or interface for content editing AI tools. - Insights into Large Language Models (LLMs): User
@swyxio
shared a link to a LessWrong post that delves into the implications and potential safety benefits of using LLMs in creating AGIs, and a related Arxiv paper that proposes a post-pretraining method for LLMs to improve their knowledge without catastrophic forgetting. - Appreciation for AI Resources: User
@thenoahhein
thanked@swyxio
for providing resources, saying it had given him reading material for the week. The resources link to a Twitter post from user Eugene Yan.
Links mentioned:
- LLaMA Pro: Progressive LLaMA with Block Expansion: Humans generally acquire new skills without compromising the old; however, the opposite holds for Large Language Models (LLMs), e.g., from LLaMA to CodeLLaMA. To this end, we propose a new post-pretra...
- Mat’s Blog - Transformers From Scratch
- An explanation for every token: using an LLM to sample another LLM — LessWrong: Introduction Much has been written about the implications and potential safety benefits of building an AGI based on one or more Large Language Models…
LAION Discord Summary
- The Mysterious Case of LAION's Linkrot: @stellaathena expressed interest in any studies examining the rate of decay of a dataset like LAION due to linkrot.
- A Puzzling Inquiry on Stable Diffusion 1.6: @pseudoterminalx asked about Stable Diffusion 1.6, and @nodja speculated it might combine 1.x architecture with improvements from sdxl and more.
- Delving Deep into Aspect Ratio Bucketed SD 1.5: @thejonasbrothers shared that Aspect Ratio Bucketed SD 1.5 supports up to 1024x1024 pixels.
- CogVLM Steals the Spotlight: @SegmentationFault gave a shoutout to CogVLM stating it's highly impressive and undervalued in the AI community.
- A Dreamy Solution for Overfitting: @progamergov posted a research paper arguing that dreams could potentially act as an anti-overfitting mechanism in the human brain by introducing random noise to overfitted concepts.
- Sleep-deprived Memories Need More Research: In response to the dream theory, @progamergov expressed wish for studies investigating the effects of sleep deprivation on semantic and episodic memory formations as supportive evidence.
LAION Channel Summaries
▷ #general (8 messages🔥):
- Curiosity about Linkrot in LAION Dataset:
@stellaathena
inquires if anyone has conducted or seen any study pertaining to the rate of decay of a dataset like LAION due to linkrot. - Questions about Stable Diffusion 1.6:
@pseudoterminalx
asks for information about Stable Diffusion 1.6.@nodja
hypothesizes that it might be the 1.x architecture with some improvements from sdxl and some extras. - Insights into Aspect Ratio Bucketed SD 1.5:
@thejonasbrothers
provides insights about Aspect Ratio Bucketed SD 1.5, they stated that it supports up to 1024x1024 pixels. - Appreciation towards CogVLM:
@SegmentationFault
expresses their appreciation for CogVLM, stating that they believe it's highly impressive and undervalued in the AI community.
▷ #research (2 messages):
- Dreams as Anti-overfitting Mechanisms:
@progamergov
shared a research paper suggesting that dreaming might play a crucial role in preventing overfitting in the human brain. According to the paper, dreams introduce random noise to overfitted concepts, thereby aiding in avoiding overfitting. - Call for Testing Effects of Sleep Deprivation on Memory:
@progamergov
expressed a wish that the research extended to testing the impacts of sleep deprivation on semantic and episodic memory formations, asserting such tests could provide supporting evidence for the aforementioned hypothesis.