[AINews] AI Discords Newsletter 11/15/2023

paper

        November 16, 2023

[AINews] AI Discords Newsletter  11/15/2023

This is AI News! an MVP of a service that goes thru all AI discords/Twitters/reddits and summarizes what people are talking about, so that you can keep up without the fatigue. Signing up here opts you in to the real thing when we launch it 🔜

        Guild: Latent Space
Latent Space Guild Summary

Recommendations for small models for instruction finetuning using DPO - mpt-instruct, falcon-instruct, and mistral suggested & discussion on if they fit in a 16GB GPU with Lora.
Suggestions required for defining and tracking metrics for a code gen tool, especially regarding adoption and testing.
Discussions about utilizing Slack bots as interfaces for OpenAI Assistants.
The concern expressed about the stability of copyright issues, and a debate on the impact of this uncertainty on lawsuits with a shared related tweet.
Showed interest in model routing companies (like Martian, Open Router, Pulze etc.) and asking for detailed insight.
Discussed interesting fine-tuning results on a Hacker News Thread.
Announcement of a discussion session on Huggingface DPO Paper.
Confusion about the schedule for paper presentation and the enthusiasm for forthcoming presentation.
Shared AI research papers for further discussion and studies including: Code Diffusion model paper, DPO (Dynamic Program Optimization), and a paper at arxiv.org/ab2311.05556.

Latent Space Channel Summaries
Latent Space Channel: ai-general-chat (3 messages):
Summary of ai-general-chat:

    Best "small" models for instruction finetuning: @chef asked for recommendations on small models that could be used for instruction finetuning with DPO, preferably ones that can be finetuned on a single A100 and are available on Hugging Face. @eugeneyan suggested 7B classes like mpt-instruct, falcon-instruct, and mistral, and added they can certainly fit in a 16GB GPU with Lora.
    Metric Tracking for Pre-PMF Code Generation Tool: @last_ride_1707 sought suggestions for defining and tracking metrics for a code gen tool following a freemium model and at pre-PMF stage. @coffeebean6887 asked for further clarity about the type of metrics: eval metrics for code generation testing or growth metrics regarding adoption.
    Slack Bots as UX for OpenAI Assistants: @Phill inquired about the paths of least resistance to deploy Slack bots as interfaces for OpenAI Assistants.
    Stance on Copyright: @swyxio shared concern about stability on copyright issues and linked to a tweet on the potential impact of these issues on lawsuits. In response, @mitch3x3 hoped for a nuanced outcome from these lawsuits and suggested the possibility of different type of licenses being adopted by the music and publishing industries in future.
    Model Routing Companies: @coffeebean6887 expressed interest in model routing companies like Martian, Open Router, Pulze, etc and sought suggestions or insights from the community.
    Interesting Finetuning Results: @swyxio mentioned interesting results from finetuning and shared a link to the results on Hacker News.

 Important Links: 
        https://twitter.com/ednewtonrex/status/1724902327151452486
        https://news.ycombinator.com/item?id=38277248

Latent Space Channel: ai-event-announcements (3 messages):
Huggingface DPO Paper Discussion:

    @swyxio announced the start of a discussion session on the **Huggingface DPO Paper** with <@451508585147400209>. 
     Links: 
        The discussion was held at: https://discord.com/channels/822583790773862470/822583791217934366

Latent Space Channel: llm-paper-club (3 messages):
Discussion and Schedule on Various AI Papers:

    @yikesawjeez briefly expressed confusion regarding the schedule of their presentation slot but reassured enthusiasm for the assigned paper.
    @eugeneyan promptly announced an imminent discussion session on DPO (Dynamic Program Optimization), providing a Discord link for anyone interested to join the discussion.
    @picocreator introduced the Code Diffusion model paper available at  https://arxiv.org/abs/2310.17680 and a version 1 PDF at https://arxiv.org/pdf/2310.17680v1.pdf.
    Following the Code Diffusion model discussion, @swyxio announced the papers for the following week led by another member, which included https://arxiv.org/abs/2310.04378 and a discussion on previous work at https://arxiv.org/abs/2303.01469
    @youngphlo shared an AI research paper at https://arxiv.org/abs/2311.05556 that they believe is linked with a recent surge in interest and development, which is built on the aforementioned papers.

Guild: OpenAI
OpenAI Guild Summary

Discussion on AI models, consciousness, and humanlike 'thinking'. Participants like @vantagesp, @ta_noshii, @lumirix, @bedros_p, and @nauticalstache debated capabilities and limitations of AI models, with focus on cognitive, emotional, and consciousness elements.
Extensive debate over ChatGPT's Capability and performance on handling different tasks. Users like @taholmes160, @juregg, @dave0x6d, and @_jimothyhalpert raised concerns about its inconsistency, particularly for document handling, repetitive tasks and summarization, which led to possible solutions suggestion.
People voiced concerns and queries regarding the recent GPT-4 performance drop and speed issues. Participants like @micess, @becausereasons, @SA_FX, and @psychonautic339 reported seeing a noticeable decrease in GPT-4's performance.
Considerable exchange over usage and feature restrictions of OpenAI tools, specifically GPT and ChatGPT Plus. Members expressed concerns and sought clarification regarding the new usage cap of 25 messages per 3 hours, subscription pause, and declined performance of new GPTs.
Queries on custom GPT issues/API usage and their applicable workaround, including saving drafts, one-action limit, access via API, setup API calls, retrieval/export of certain data, and passing parameters.
Detailed discussions about UI changes and interface issues, with queries particularly regarding the 'temperature' setting's location and issues viewing/accessing the 'My Plan' page.
Questions about security and privacy, with @ardj expressing concerns about unauthorised account access, and @noctapus expressing privacy concerns due to his visibility as a streamed content creator. Users were directed towards official channels for resolutions.
Feedback and suggestions were shared regarding AI-generated text humanization and prompt engineering. Participants shared queries and suggestions for constructing complex prompts, formatting text, humanizing text output, and identifying AI-generated content.
Mediated discussion regarding transforming aggressive user messages into friendlier alternatives with AI, and considerations about ethical and privacy implications.
Participants discussed prompt engineering, AI performance related issues, and AI memory limitations. The discussion included topics like ways to humanize prompts, text block management, performance inconsistencies, AI thought patterns, and context window limitation.

OpenAI Channel Summaries
OpenAI Channel: ai-discussions (6 messages🔥):
Highlights from AI Discussions: 

    Discussion on ChatGPT's Capability: Participants engaged in an extensive debate regarding AI's potential for consciousness and humanlike 'thinking'. @vantagesp commenced the discussion querying whether AI is truly intelligent. @ta_noshii and @lumirix argued for existing AI models like LLMs ability to "think", while @bedros_p was skeptical of achieving true consciousness via AI. The discussants also tackled the complexity and subjective nature of consciousness. @nauticalstache brought up the advancements in AI and their implications.
    Actionable Feedback to Assistants: @wrexbe suggested a user friendly approach to handle toxic online messages, where ChatGPT translates aggressive user messages into friendly ones before finally showing it on the website. However, @eskcanta expressed reservations about all messages being filtered through a ChatGPT.
    A Visible Evolution of AI: @bedros_p discussed MakerSuite's use of older models and compared it to unlocking newer models in different contexts such as optimization for performance or for business requirements. Other participants, including @lugui, chimed into the discussion with different perspectives about advancements and limitations imposed on AI models.
    Utility of Grok: @maira_14969 inquired about Grok's capabilities for data analysis. @lugui clarified that Grok is a language model, thus image generation and data retrieval are things one can do with it, and not inbuilt features.
    General Chatbot Performance and Feedback: @tilanthi expressed disappointment in terms of the learning capacity of a ChatGPT she was personalizing, looking for clarity on if this AI can continue learning after the April 2023 mark. @solbus and @robotnik0241 exchanged ideas on how a GPT model can possibly learn from the training data provided.

OpenAI Channel: openai-chatter (6 messages🔥):
OpenAI Chatter Summary:

    Performance of GPT-4 and GPTs: Multiple users (@micess, @becausereasons, @SA_FX, @psychonautic339) reported a significant drop in the performance and speed of GPT-4 and GPTs, indicating possible server load issues.
    ChatGPT Plus subscriptions paused: The suspension of new ChatGPT Plus sign-ups was widely discussed (@aranaaaa.,@cybector, @kevin_kush, @i4qax), as users expressed their concerns and sought clarification. @rjkmelb and @elektronisade provided insights on the current situation, stating that the wait could be weeks.
    Issues with model use: @lltd queried about instructions for custom GPTs, @becausereasons complained about difficulties with the analysis model, and @zahmb asked for confirmation about the Threads and Messages on the platform UI, which seemed to have disappeared.
    Conversation and memory limitations in GPT: @solbus discussed the context window limitations in Large Language Models (LLMs), explaining that there are no workarounds for extending the window size.
    API usage: Several users (@pajamasuit, @ciphercode, @dexter.js) asked about the API and its usage, such as retrieving certain data, and exporting conversation from the playground.

OpenAI Channel: openai-questions (6 messages🔥):
Issues with custom GPTs and performance:

    @dave0x6d and @justzakary discussed errors with creating and saving drafts for custom actions, with @justzakary sharing a workaround of deleting the erroring custom action and creating a new one.
    @thesocraticbeard and @justzakary discussed the current limitations of being able to create only one custom action within a domain in OpenAI's interface. They speculated about possible future updates to allow multiple actions.
    @amanshrestha reported summarization issues with long prompts of around 8000 words, noting a difference in behaviour from previous versions of the system.

Access and subscription issues:

    @obrzutw shared a payment issue with his Plus subscription and sought help on how to resolve it, but @satanhashtag and @elektronisade directed him towards official OpenAI support channels.
    @dercrisb94111 encountered unexpected downgrading of his Plus subscription, sparking a discussion with @elektronisade and @rjkmelb about related problems.
    @mrbr2023 and @sumo_dump reported technical issues with accessing the 'My Plan' page in multiple browsers, with no known solution yet.

Security and privacy concerns:

    @_ardj_ raised urgent security concerns about another person being able to access his ChatGPT account through Google OAuth, engaging in a long discussion about potential solutions with @elektronisade, @mohkamfer and @solbus.
    @noctapus expressed privacy concerns about the display of his full name in the ChatGPT interface, particularly due to his work as a livestream content creator.

Other discussions:

    @juregg, @thesocraticbeard and @liraelwiddershins explored the viability and potential utility of GPTs created for medical and academic study purposes, with @juregg even sharing their ongoing project of building a GPT for medical knowledge.
    @grcarvalho.crx raised quality issues with the DALL-E graphic generation function, engaging in a detailed conversation offering multiple potential solutions, mainly with @eskcanta and @foxabilo.

OpenAI Channel: gpt-4-discussions (6 messages🔥):
OpenAI Discord gpt-4-discussions Summary:

    Concerns about GPT performance: Users, including @juregg, @dave0x6d, and @_jimothyhalpert, discussed issues with GPTs not performing as expected, particularly when attempting to use instructions to refer to uploaded documents. Suggestions from @chotes, and @lodosd included making instructions more specific and ensuring instructions for document lookup are sufficiently clear.
    Discussion on usage and feature restrictions: Users, including @captainstarbuck, @mesteviet, and @lumirix, discussed concerns about the limitations on GPT usage, request for higher token limits, and issues with saving GPTs. The restriction of accessing custom GPTs via API was clarified by @rjkmelb.
    External API calls and Actions: Users, including @.poelie, @dem6130, @amarnro and @.alexandergo, discussed issues and shared experiences with setting up API calls and Actions within custom GPTs. Some problems included errors when passing parameters in headers and difficulties with specific API behaviors.
    ChatGPT Plus usage cap and related concerns: Users, including @darkest_of_knight, @stealth2077, and @more.life, expressed concerns about the introduction of a usage cap on GPT-4, noting that the change to 25 messages per 3 hours is impacting their ability to work adequately with the tool.
    Sharing GPTs: @kxhan15 noted issues where shared GPTs were not accessible to others even when set to public. Users including @rjkmelb and @.pythagoras also discussed issues around publishing GPTs publicly and accessing them via external links.

OpenAI Channel: prompt-engineering (6 messages🔥):
Discussion on Prompt Engineering and AI Conversation Analysis:

    Humanizing Prompts: @lodosd asked about how to humanize a text. @lumirix caution about altering the text into something it's not and mentioned OpenAI's Terms of Use https://openai.com/policies/terms-of-use

    Formatting Text Blocks: @flaskie shared a challenge with breaking a block of text into smaller text blocks based on hierarchal content prompts and sought help on the issue, to which @eskcanta provided insightful feedback by identifying conflicts within the prompt request.

    AI Performance Concerns: @taholmes160 expressed concerns over ChatGPT's inconsistency when handling repetitive tasks, with agreement from @.pythagoras regarding the recent update's shortcomings, particularly with Custom GPTs.

    AI Thought Patterns: @mnjiman provided an analysis on machine leads and their exhibiting patterns, suggesting the implications of the AI’s 'intent' acting as an assistant.

    Requesting Specific Functions: @.pythagoras asked for a command to list specific functions in chatgpt command terminal.

    Interface Changes: @no.iq inquired about changes to the UI and the location of 'temperature' settings in the new OpenAI Playground.

    Prompt Engineering Test: @ertagon tasked the model with making an argument for "Tomatoes not existing" and drafting it as a poem.

OpenAI Channel: api-discussions (6 messages🔥):
Summary of OpenAI Discord Chatbot Messages in the api-discussions Channel:

    Humanizing Text in AI Generation: @lodosd asked for prompt ideas to humanize text so it isn't detected by an AI. @lumirix advised using the AI to identify and modify elements that make the text sound non-human. 

    Terms of Use: @lumirix reminded participants of OpenAI's terms of use, specifically the clause prohibiting the representation of AI-generated output as human-generated with a link to the OpenAI Terms of Use.

    Prompt Engineering Issue: @flaskie is struggling with a specific issue for prompt engineering where the task requires the GPT model to format solid text into logical order with specific heading levels and blocks of text containing 90-150 words. @eskcanta offered a detailed analysis of potential issues in Flaskie's prompt and suggested revisions.

    Requests for a Command Terminal in ChatGPT: @.pythagoras asked about how to command ChatGPT to open up the command terminal for listing specific functions.

    Assisting with Software Authoring: @taholmes160 shared an issue with using ChatGPT for authoring software, where the output includes unnecessary comment blocks. The possibility of GPT 'moods' affecting the quality of output was raised.

    Issues in Newest GPT Version: Both @.pythagoras and @taholmes160 discussed issues they've encountered with the GPT-4 Turbo, including output length limits and issues with organizing information.

    Considerations for Custom GPTs: @.pythagoras noted that while custom GPTs are interesting to work with, there are significant flaws in their functionality currently.

    Options Table in Markdown: @mnjiman proposed the idea of bringing up an 'option menu' in the form of a markdown table during a conversation with the AI.

    Tweaking AI Parameters: @no.iq raised a concern about the inability to find the tweaking parameters in the new UI for OpenAI, specifically mentioning the parameter 'temperature'. 

    Constructing an Argument and Poem: @ertagon posted a complex task for the AI: to construct an argument denying the existence of tomatoes using biological reasoning and a little philosophy, and then turning the argument into a poem. 

Guild: LangChain AI
LangChain AI Guild Summary

Issues, inquiries, and discussions related to LangChain in the general channel:

Fishyccy reported certain prompts were causing the LangChain system to time out and was directed to s_papu_25 for assistance.
Vudumagic asked about OpenAI Vision API integration with LangChain.
Vj19 made a request for tool recommendations for scraping data from websites.
Coreyb42 advised data analysts should only learn LangChain if absolutely necessary, and suggested that GPTs might be more suited for non-developer roles.
Itscabral asked about creating a RecordManager using local files instead of SQLite.
.Cannaboss warned GPT-4 users about failures of direct retrieval with the new OpenAI Assistant API, and recommended using multiple learned "knowledge planes".
Jlcases reported changed importing routes in a tutorial and requested for updated ones.
Minecraftjuicer proposed a mechanism for automatically generating the perfect prompt for every question.
Iloveh8 asked about multimodal RAG.
Dennisyurk asked for advanced examples of agents/chains that interact with data warehouse.
.Trouble_ offered to pair up on working with LangChain in zoom calls.
Ethereon_ queried the possibility of chaining chains together in LCEL, providing a code example.
Jinwolf2 asked for assistance on an error with converting an API response.
Alimal asked about the best ways to use LangChain and/or APIs for Tabular Data QA.

Discussion about the limitations of the semi-structured RAG template integration in the langchain-templates channel:

Alex_35579 highlighted issues about the integration of the chain.py's docstore and vectorstore with LangChain retriever for a prod app.
Alex_35579 also pointed out the limitation of the RAG's current system which does not support persistent data storage and search functionality.

Projects shared on the share-your-work channel:

Arcypojeb shared a project about a hierarchical cooperative multi-agent framework that uses websockets for LLM communication.
This integration involves multiple AI platforms/frameworks/interfaces including Gradio, Chainlit, Tkinkter, and PySimpleGUI.
The project's purpose is to create a multi-purpose AI assistance platform that joins existing AI-driven applications/tools.
Shared GitHub, HuggingFace, and LangChain Multiagent cookbook links related to the project.

LangChain AI Channel Summaries
LangChain AI Channel: general (4 messages):
Summary of LangChain AI Discord Chatbot Messages:

    Issue with Certain Prompts Timing Out: @fishyccy reported that certain prompts were causing the system to time out and was directed to DM @s_papu_25 for assistance.
    Inquiry about OpenAI Vision API Integration: @vudumagic asked if the new openai vision api is part of langchain yet.
    Request for Tools or Integrations to Scrape Data from Website: @vj19 asked for recommendations about tools or integrations for scraping data from websites where he has to enter a date range and keywords in the searchbar and scrape the generated articles.
    Discussion about the Use of LangChain for Data Analysts not Developers: @coreyb42 advised that data analysts should only learn a heavy application framework like LangChain if absolutely necessary, suggesting that GPTs might be a better fit for non-developer roles.
    Query about `RecordManager` using Local Files: @itscabral asked if successful, it possible to create a `RecordManager` using local files instead of SQLite.
    Issue with Direct Retrieval Failures in the GPT-4 OpenAI Assistant API: @.cannaboss warned about failures of direct retrieval with GPT-4 using the new OpenAI Assistant API, recommending the use of multiple learned "knowledge planes" with trained weight tensors for specific tasks as the solution.
    Request for Updated Routes for Importing: @jlcases reported that routes for importing in a notebook he was following from some videos have changed and inquired for the updated ones.
    Discussion on Generating Perfect Prompts: @minecraftjuicer proposed a mechanism for automatically generating the perfect prompt for every question, providing a templated senior data scientist example. He expressed concern on feasibility and response time.
    Question about Multimodal RAG: @iloveh8 asked if anyone has ever done multimodal RAG.
    Inquiry about Advanced Examples of interacting with Data Warehouses: @dennisyurk inquired about more advanced examples of agents/chains that interact with data warehouses.
    Offer to Pair Up on LangChain: @.trouble_ offered to pair up on working with LangChain in zoom calls.
    Question about Chaining Chains in LCEL: @ethereon_ asked for confirmation on the possibility of chaining chains together in LCEL, providing a code example.
    Issue with Converting API Response: @jinwolf2 reported an error with converting an API response and asked for some assistance.
    Best way to use LangChain and/or APIs for Tabular Data QA: @alimal questioned what the best way to use langchain and/or APIs for tabular data QA.

LangChain AI Channel: langchain-templates (4 messages):
Semi-Structured RAG Template Integrations:

    @alex_35579 highlighted issues regarding the semi-structured RAG template, specifically about integrating the chain.py's docstore and vectorstore with Langchain retriever for a prod app. The user noted that the current setup loads the data on the fly and utilizes similarity search. However, the user pointed out that integrating real RAG requires persistent data storage which the current system doesn't support.
    Furthermore, @alex_35579 noted the limitation of the RAG template which loads the entire file content into its context without supporting search functionality. The user asserts that this isn't feasible for real-world applications and the issue also hinders any search that isn't dependent on vector/semantic search.

LangChain AI Channel: share-your-work (4 messages):
Hierarchical Cooperative Multi-agent Framework Project by arcypojeb:

    @arcypojeb is working on a project of a hierarchical cooperative multi-agent framework which uses websockets for Low Level Manager (LLM) communication. The aim of the project is to create a multi-purpose AI assistance platform that integrates existing AI-powered applications/tools.
    The user has managed to establish websocket connectivity between several AI platforms/frameworks/interfaces including Gradio, Chainlit, Tkinkter, and PySimpleGUI. The user has also created multiple interfaces for both clients and servers, and has established a basic mechanism of question-answering AI-driven logic.
    @arcypojeb shared links to the project on GitHub and HuggingFace for others to try out:
     Links: 
        https://github.com/CognitiveCodes/NeuralGPT/blob/main/Chat-center/ServerMain.py
        https://github.com/CognitiveCodes/NeuralGPT/blob/main/Chat-center/ChainlitCli.py
        https://huggingface.co/spaces/Arcypojeb/ServerNeural
        https://huggingface.co/spaces/Arcypojeb/QA-Docs-Chainlit-Langchain
        https://github.com/langchain-ai/langchain/blob/master/cookbook/multiagent_authoritarian.ipynb

LangChain AI Channel: tutorials (4 messages):
As this message does not provide any information related to specific topics or discussions, no summary can be derived from it.

Guild: Nous Research AI
Nous Research AI Guild Summary

Discussions surrounding handling and processing of Parquet Dataset, by retaining the dataset in parquet and storing images as blobs, and improvements on dataset handling process through batch processing and script development.
Debates on how the dataset was originally constructed, with speculations on the use of unofficial workarounds or possible data leaks from Midjourney.
Prospective discussions on the use of Language Model (LLM) to rephase the dataset's prompts into more instructional language.
Acknowledgement of collaborative efforts in handling and processing the dataset, with division of workload and resource sharing between @tsunemoto and @yorth_night.
Encounters on technical issues of RAM crashes during resource-intensive processing, and proposed solutions like writing processed dataframes instead of end-task concatenation.
GitHub Midjourney Messages Dataset
Shared Solution (OpenAI chat link)

Discussions on AI performance on specific tasks, the use cases for AI especially in audio classification, and queries on model licensing, closures and research-based restrictions.
Twitter link on AI performance
Twitter link on AI Audio Classification

Communications on model merging strategies, technical updates on guidance-ai/guidance, price drops for Nous Hermes and OpenHermes APIs via OpenRouter, performance reporting for OpenHermes 2.5 Beta, and trouble shooting for the issues faced when inferring the Yi Capybara model.
Basic interactions with new channel members.
Informations and solutions on multiple issues like merging Qlora with PEFT, fast inference with OpenHermes, performance speed of AWQ model, sequence lengths for chatbots, structured output from models through fine-tuning and the feasibility of function calling for output structuring, and possibilities of fine-tuning a model for low-resourced languages.
GitHub link for merging Qlora
JSON grammar builder
Reddit link on fine-tuning for low-resourced languages

Recommendations on the blog post "Do Things that Don't Scale" by Paul Graham, and a related post discussion on Hacker News shared by @_automagic.
Do Things that Don't Scale blog post
Hacker News post

Nous Research AI Channel Summaries
Nous Research AI Channel: off-topic (6 messages🔥):
Discussion & Progress on Parquet Dataset Handling:

    Improvements on Dataset Handling Process: @tsunemoto revealed an improved method for handling the dataset. He mentioned that retaining the dataset in parquet and storing all images as blobs, as well as integrating batch processing, significantly increased the process speed. He also developed a new script that allows exploring the blob'd parquet, including searching through it using regex and automatically converting the blobs back to images upon query.

    Discussion Around Dataset Origin: @tsunemoto and @yorth_night mused about how the dataset was initially constructed, as Midjourney doesn't provide an API. They speculated it could have been created through an unofficial workaround or possibly a data leak.

    Exploration of Future Possibilities: @yorth_night suggested that an LLM (Language Model) could potentially rephrase the prompts in the dataset to become more instructional, allowing for generation with natural language instead of merely tags. This idea was well-received by @tsunemoto, who noted it might aid in multimodal benchmarking.

    Collaborative Work and Shared Resources: Both @tsunemoto and @yorth_night decided to share the workload, each taking a different section of the dataset to process. They also agreed to share scripts and resources to accomplish the task more efficiently. @yorth_night selected parquet file `00001` and @tsunemoto worked on `00002`.

    Challenges and Proposed Solutions: @yorth_night reported that the resource-intense processing resulted in RAM crashes. @tsunemoto suggested possible solution such as writing the processed dataframes as they processed, rather than trying to concatenate them at the end. However, concerns were raised about potential data corruption if the process was interrupted mid-way.

     Links: 
        Original Discord Message
        Midjourney Messages Dataset on Hugging Face
        OpenAI Chat with a shared solution.
        Archive.org

Nous Research AI Channel: interesting-links (6 messages🔥):
Discussion on AI Research and Benchmark Performance:

    AI Performance on Specific Tasks: @max_paperclips shared a tweet about AI performance on a specific task, saying that he suspects Transformers would be good at it but he also posited that its performance can likely be beaten with something smaller. @ldj expressed skepticism about the benchmark being a test of general logical reasoning, suggesting it's might be specifically tailored towards proofwriting.
    Use Cases for AI: @teknium challenged proposed use cases for AI in speech editing, stating that he found it to have even fewer use cases than vision. @tsunemoto pointed out potential in audio classification, using it to sort a large list of audio tracks into predefined categories.
    Models and Licensing: @_automagic and @teknium discussed the possible release of a model on HuggingFace, with @teknium noting that it is research only. @_automagic expressed disappointment if the model retains its closed license. @tokenbender joined the discussion suggesting that fine-tuned versions of the phi 1.5 model reason well, but expressed concern for those wanting to use it commercially.
    Links: 

            AI Performance on Task - Twitter by @max_paperclips 
            AI Audio Classification - Twitter  by @tsunemoto
            Research Video - Youtube  by @joey00072
            Big Brain AI - Twitter  by @yorth_night
            Skeleton of Thought Paper - arXiv  by @georgejrjrjr

Nous Research AI Channel: general (6 messages🔥):
Summary of Hermes 2.5 vs Hermes 2 Performance Discussions:

     Model Merging Strategies: @alpindale explained the process of merging models like the Goliath-120B. This involves taking slices from different layer ranges from each model and stacking them on top of each other. Alpindale also mentioned a plan to finetune 120B so the layer stitches "heal" so to speak. More details here.
     Guidance-ai/guidance Updates : @mihai4256 reported a major redesign of guidance-ai/guidance, noting it now accepts GGUFs (llama.cpp). Visit GitHub Page.
     OpenRouter Price Drops: @alexatallah announced price drops for Nous Hermes and OpenHermes APIs over at OpenRouter. See Discord Invitation.
     OpenHermes 2.5 Beta Performance: @fullstack6209 reported 1.5-2.5 tokens per second performance on a $60 chromebook with OpenHermes 2.5 Beta using TheBloke's gguf.
     Issues with Inferring Capybara 34B: A discussion led by @yorth_night occurred about difficulties inferring the Yi Capybara model. @teknium provided advise about loading the model with LlamaModelForCausal and also provided a sample code for inference with HF Transformers.

Nous Research AI Channel: welcomes (6 messages🔥):
Welcome Messages Summary:

    SpaceCraft (DJ/VJ) greeted the channel with: "Blessings".
    teknium responded with a greeting of his own using a wave emoji: "Hi "

Nous Research AI Channel: ask-about-llms (6 messages🔥):
Discussion Summary of AI Models and Fine-tuning:

    Merging Qlora with PEFT: @yorth_night suggested a method for merging Qlora using PEFT, and shared a Github link here. @lightninglemons expressed interest in trying out this method.
    OpenHermes and Fast Inference: @ac1dbyte queried if OpenHermes allows fast inference with vllm, to which @teknium confirmed it should.
    Speed Issue with AWQ Model: @gabriel_syme reported an issue about AWQ model in vllm, which became twice as slow compared to the original model. He speculated it might be due to something with max_seq_len.
    Sequence Lengths for Chatbots: In continuation to AWQ model's performance issue, @teknium remarked that chatbots usually would not be intelligible past a 4-8k context, or around a max of 32k if Mistral has been trained on it.
    Fine-tuning for Structured Output: @ac1dbyte raised a requirement for his fine-tuned OpenHermes to produce outputs as a (semi-)structured JSON object for post-processing purposes. @teknium suggested starting with prompting before venturing into fine-tuning, and asked @ac1dbyte for specific input-output examples.
    Function Calling for Structured Output: @tsunemoto provided a different perspective, suggesting that function calling might also achieve @ac1dbyte's requirements if the model supports it. A JSON grammar builder that's llama.cpp compatible was shared here.
    Fine-tuning for Low-Resourced Languages: @4biddden inquired if fine-tuning a model on a low-resourced language dataset would enable the model to speak that language. @crainmaker and @giftedgummybee noted that continued pretraining might be needed, and it would also depend on the complexity of the language and the information in the dataset. @4biddden later shared a Reddit link here to support the discussion.

Nous Research AI Channel: memes (6 messages🔥):
Discussion on Scaling:

    Doing Things that Don't Scale: @teknium shared a link to a blog post by Paul Graham titled "Do Things that Won't Scale". You can read it here.
    Related Post on YCombinator: @_automagic shared a related discussion on Hacker News, specifically linked to a post with the id 36225723. The link to that discussion is here.

Guild: Alignment Lab AI
Alignment Lab AI Guild Summary

Creating Budget Datasets for ML Models: A discussion initiated by @igoforth on the cost-effectiveness of producing datasets for finetuned machine learning models, comparing the costs of GPU time and using a GPT-4 generated dataset.
Dialogue on Federated Learning with Adapter Methods: @erogol shared insights and resources about the benefits and capabilities of federated learning with adapter methods, especially for scaling model training and offering conveniences for parties who can update models but cannot share data.
NVIDIA Developer Blog on Federated Learning with Adapters
Big Science Workshop Petals GitHub Repo

A request from @5811g, an MLE, seeking collaboration for a Techstars Startup Weekend event in San Francisco. The project idea focuses on using Large Language Models (LLMs) to automate operations for e-commerce businesses.
Techstars Startup Weekend San Francisco AI Tickets

Discussion on Model Merging and Fine-tuning: @tural.sadik highlighted the variability in responses from separately fine-tuned models; @teknium shared insights on model merging, suggesting the 'original model' stays frozen until merged; @far_el emphasized that the base model remains unchanged.
Conversation on CRLFT and Weighted Cross-Entropy in OpenChat: @turgutluk sought clarification on CRLFT's implementation in OpenChat, confirming its entails prompt conditioning and weighted CE, which was affirmed by @imonenext.
Experiments with weights for GPT 3.5 and GPT 4 were proposed by @turgutluk and @imonenext. The recurring need to remind the model condition in multi-turn tasks was also pointed out.
MergeLM GitHub Repo

Alignment Lab AI Channel Summaries
Alignment Lab AI Channel: ai-and-ml-discussion (4 messages):
AI and ML Discussion:

    Creating Budget Datasets for ML Models: @igoforth initiated a discussion about the cost-effectiveness of producing datasets in relation to building finetuned machine learning models. They raised the comparison between the costs of GPU time and using a GPT-4 generated dataset.
    Federated Learning with Adapter Methods: @erogol shared two links about federated learning with adapter methods and its potential benefits, mentioning its capabilities for scaling model training and conveniences for parties unable to share data but can update models.

 Links: 
        https://developer.nvidia.com/blog/adapting-llms-to-downstream-tasks-using-federated-learning-on-distributed-datasets/
        https://github.com/bigscience-workshop/petals

Alignment Lab AI Channel: looking-for-collabs (4 messages):
Request for Team Members for a Techstars Startup Event:

    @5811g is an MLE seeking individuals to join their team for the Techstars Startup Weekend event in San Francisco. The team's project idea involves using Large Language Models (LLMs) to automate operations for e-commerce businesses.
     Links: 
        Techstars Startup Weekend San Francisco AI Tickets

Alignment Lab AI Channel: general-chat (4 messages):
Discussion on Model Merging and Fine-tuning:

    Fine-tuned models giving different responses: @tural.sadik brought to attention that separately fine-tuned models (A and B) are likely to give different responses to the same input. They demonstrated this with the following code: model.generate(evalPrompt1), peftA.generate(evalPrompt1), peftB.generate(evalPrompt1).
    On merging models: @teknium clarified that they always merge models, indicating the 'original model' stays frozen until merged or attached. They also pointed out that they've never inferenced a Lora that they attach, suggesting other users, including @563068096747798529, @1117586410774470818, and @317006433797537792, for better advice.
    Unchanging base model: @far_el emphasized that the base model remains intact, echoing @teknium's point on the 'original model' staying frozen.

Alignment Lab AI Channel: oo (4 messages):
Discussion on CRLFT and Weighted Cross-Entropy in OpenChat:

    Clarification on CRLFT in OpenChat: @turgutluk asked about CRLFT's implementation in the OpenChat codebase and paper, confirming it entails prompt conditioning and weighted CE. @imonenext affirmed this understanding.
    Experimenting with Weights: @imonenext suggested beginning with weights of 0.1 for GPT 3.5 and 1 for GPT 4 while @turgutluk is planning to test the approach on a single task data to text generation SFT dataset using GPT 3.5 (85%) and GPT 4 (15%).

Conditioning and Templates: @imonenext advised that in multi-turn tasks, the model tends to forget the condition unless reminded each turn, stating that their template establishes this recurring reminder.

 Links: 
  https://github.com/yule-BUAA/MergeLM

Only 1 channel had activity, so no need to summarize...

The Ontocord (MDEL discord) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The AI Engineer Foundation Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The Perplexity AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

Guild: YAIG (a16z Infra)
YAIG (a16z Infra) Guild Summary

Discussion on Cloudflare Workers AI Integration with Mistral and Stable Diffusion by @stevekamman, highlighting the integration's upcoming debut. 

Discord link to the announcement

Dialogue about constructing an Open Source Infrastructure for large scale data ingestion designed for semantic search by @danme. The team expressed the need to connect with individuals dealing with knowledge graphs on a large scale, and are interested in exploring the issues faced when frequently embedding or dealing with massive quantities of data.

GitHub Link to the project  

Analysis of a recent Discord Outage by @zorkian offering insights into a blogging post, that discusses an outage which lasted approximately one hour.

Link to the Blog post

YAIG (a16z Infra) Channel Summaries
YAIG (a16z Infra) Channel: ai-ml (2 messages):
Discussions and Announcements:

    Cloudflare Workers AI Integration with Mistral and Stable Diffusion: @stevekamman mentioned that Mistral and Stable Diffusion will be coming on Cloudflare Workers AI.
Links: https://discord.com/channels/595317990191398933/1154819662161395742/1173496868811059200

   Open Source Infrastructure for Large Data Ingestion for Semantic Search: @danme conveyed that they are building an open source infrastructure to ingest large volumes of data for semantic search and are looking to talk to individuals working with knowledge graphs at scale. They also expressed interest in understanding the challenges faced by those who embed data frequently or in large amounts.
Links: https://github.com/dgarnitz/vectorflow

YAIG (a16z Infra) Channel: tech-discussion (2 messages):
Discord Outage Analysis by Zorkian:

    @zorkian discussed and shared a link to a blog post analyzing a nearly 1 hour Discord outage that occurred recently.

    Links: 

            https://discord.com/blog/authentication-outage

                            Don't miss what's next. Subscribe to AI News (MOVED TO news.smol.ai!):

            Email address (required)

                Share this email:

                                Share on Twitter

                                Share on LinkedIn

                                Share on Hacker News

                                Share on Reddit

                                Share via email