1699062952925 Discord Newsletter Test

                November 4, 2023

            1699062952925 Discord Newsletter Test

This is AI News! an MVP of a service that goes thru all AI discords/Twitters/reddits and summarizes what people are talking about, so that you can keep up without the fatigue. Signing up here opts you in to the real thing when we launch it 🔜

Guild: Skunkworks AI

Inquiry into the usage of torch.nn.MultiheadAttention in different libraries:
    - User aniketmaurya questioned why many libraries build their own MultiheadAttention implementation instead of using torch.nn.MultiheadAttention.
    - User benjamin_w replied that this might be because torch.nn.MultiheadAttention only supports flash attention during inference and not during training.

Discussion on SVD implementation and model quantization:
    - Talked about the problem of SVD implementations reshaping layer weights when a quantized model is loaded.
    - Suggestion of passing "full_matrices=false" to SVD, which may result in needing to reshape the outcome to set "lora_a" and "lora_b" weights with proper dimensions.

Proposition on a potential solution for regular LoRA:
    - It was proposed that alterations to the regular LoRA might provide a solution to the above problem, contingent on the VRAM budget.

General introductions and greetings among users in the welcome channel with fluctuating levels of excitement. No significant discussion topics or links to note. Involvement varied from users introducing themselves to isolated messages lacking contextual information.

--- Channel by Channel Summary --- 
Channel: general
Summary: 
1. (specific topic title, e.g. "Usage of torch.nn.MultiheadAttention") (Excitement: N/A)
    - (specific discussion thread, e.g. "Most libraries having their own MultiHeadAttention implementation")
        - The user aniketmaurya expressed curiosity about why most libraries have their own MultiHeadAttention implementation instead of using torch.nn.MultiheadAttention.
    - (specific discussion thread, e.g. "Limitation of torch.nn.MultiheadAttention")
        - The user benjamin_w mentioned that torch.nn.MultiheadAttention currently only supports flash attention during inference, not training.
    - Links: N/A
Channel: moe-main
Summary: 
1. SVD implementation issue with model quantization (Excitement: 4/10)
    - There was a discussion about the problem of SVD implementations reshaping layer weights when loading a quantized model.
    - It was mentioned that passing the argument "full_matrices=false" to SVD would require reshaing the result to subsequently set "lora_a" and "lora_b" weights with the proper dimensions.
2. Possible workaround for regular LoRA (Excitement: 5/10)
    - It was mentioned that using regular LoRA with a few alterations could potentially address the issue, depending on the VRAM budget.
Channel: welcome
Summary: 
1. (Topic: General Introduction and Greetings) (Excitement: 3/10)
    - Users introduced themselves in the channel.
    - Some users exchanged greetings.
    - No specific discussion points or links were mentioned.

(Topic: General Discussion) (Excitement: 4/10)
    - Users engaged in a general conversation.
    - No specific discussion points or links were mentioned.

(Topic: Unspecified) (Excitement: 2/10)
    - A user named "paradisen" posted a single message with no context provided.
    - No further discussion or links were mentioned.

(Topic: Unspecified) (Excitement: 2/10)
    - A user named "roberto_there" posted a single message with no context provided.
    - No further discussion or links were mentioned.

(Topic: Unspecified) (Excitement: 2/10)
    - A user named "mike.bird" posted a single message with no context provided.
    - No further discussion or links were mentioned.

(Topic: Unspecified) (Excitement: 3/10)
    - A user named "whimsicalism" posted a single message with no context provided.
    - No further discussion or links were mentioned.

Guild: Nous Research AI

Members debated on the legitimacy of using 128k context lengths for AI models and questioned how to use gguf on a 4090 GPU for better efficiency.
Technical discussions revolved around emulating ZRAM on MacOS and improving flash decoding methods, highlighted by a research paper on flash decoding.
An admiration for Tsinghua's researchers was noted, while the application and performance of various AI models were actively discussed through benchmarks shared.
The use of 8bit over 4bit for better efficiency was shared in a code snippet alongside exchange on the slow inference of 4bit and probable solutions to enhance the speed.
Significant interest was displayed in trying new models, investigating their benchmarks, and understanding setbacks, with a notable focus on GPT-4's performance.
A GitHub link was shared that exhibited example inference code using transformers for the new OpenHermes 2.5 model.
Further inquiries were made about models fine-tuned for function calling, recommendations for quant or norm for data generation, and various user links shared for experimenting with different models.
Focus on the incorporation of AI in art was expressed by a guild member.
Discussions extended to the limitations, performance trade-offs, and model size concerns of quantized sizes.
Community members also raised safety considerations in fine-tuning models, discussed continuous learning, model divergence, and the efficacy of rlhf methods.
Laughter and humor were exhibited through programming language memes shared within the Discord community.

Relevant links:
- https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B/blob/main/transformers_inference.py
- https://colab.research.google.com/drive/1x8OsqBdHMUsQ5jlu_NwIPpOl1lPUWm-C?usp=sharing
- https://huggingface.co/datasets/jondurbin/airoboros-gpt4-2.0
Note: While summarizing, there were instances where the conversation context was not clear. In those cases, an assumption has been made based on guild norms and user post history. If something is unclear, please refer back to the original messages.
--- Channel by Channel Summary --- 
Channel: ctx-length-research
Summary: 
1. (specific topic title, e.g. "128k Context Length Usage") (Excitement: 6/10)
    - (specific discussion thread, e.g. "Is it legit for the 128k context lengths?")
        - There is a discussion about the legitimacy of using 128k context lengths.
        - A link is shared regarding the utilization of 128k context lengths.
2. (specific topic title, e.g. "Using gguf on a 4090 GPU") (Excitement: 4/10)
    - (specific discussion thread, e.g. "Emozilla's experience with 4bit llamacpp gguf on a 4090 GPU")
        - Emozilla mentions being able to achieve up to 24~k on a 4090 GPU with 4bit llamacpp gguf.
        - ogmilady shares their experience with using gguf and looking for leads on possible issues.
    - (specific discussion thread, e.g. "Running the llama_cpp.server script locally")
        - ogmilady shares the command they used to run llama_cpp.server locally and asks for help.
3. (specific topic title, e.g. "Improvements to Flash Decoding") (Excitement: 5/10)
    - (specific discussion thread, e.g. "Apparent improvements to flash decoding")
        - ldj mentions that apparent improvements to flash decoding are already being made.
        - A link to a research paper on flash decoding is shared.
4. (specific topic title, e.g. "Tsinghua Researchers") (Excitement: 6/10)
    - (specific discussion thread, e.g. "Impressive researchers at Tsinghua")
        - conceptofmind expresses admiration for Tsinghua researchers.
    - (specific discussion thread, e.g. "Including our friend One")
        - ldj mentions that their friend "One" is among the impressive researchers.
5. (specific topic title, e.g. "Emulating ZRAM on MacOS") (Excitement: 3/10)
    - (specific discussion thread, e.g. "Code for emulating ZRAM on MacOS")
        - chadbrewbaker asks if there is code to emulate ZRAM on macOS.
        - A link to code for creating a RAM disk on macOS is shared.
Links:
- https://x.com/chillgates_/status/1720303678526267441?s=20
- https://arxiv.org/pdf/2311.01282.pdf
- https://apple.stackexchange.com/questions/461889/ram-disk-in-macos-ventura
Channel: off-topic
Summary: 
There were only a few messages in the off-topic channel:

(No specific topic mentioned) (Excitement: N/A)