Week 8 of 2023
5 Minutes of Data Science - week 8
Highlights from February 20 to February 26
Foreword
The newsletters feeds have been growing.
Come say hi on Mastodon. See you next week!
Newsletters
- Last Week in AI #207: Bing Chat’s troubled beta tests, generative AI for Roblox, backslash against Apple using voice actor data, and more!, by Last Week in AI
- Import AI 318: RL and addiction; Toolformer; and theology and AI., by Import AI
- Deep Neural Networks: All the Building Blocks, by The AI Edge
- 🥇Top ML Papers of the Week, by NLP news
- The Future of Search and How You Can Shape It, by Gradient Flow
Podcasts
- Deep learning vs tabular models (Ep. 217), by Data Science At Home
- Reproducible ESP Testing, by Data Skeptic
- Applied NLP solutions & AI education, by Practical AI
- Understanding AI’s Impact on Social Disparities with Vinodkumar Prabhakaran - #617, by The TWIML AI
- Accelerating the Adoption of AI through Diversity - Dânia Meira, by Data Talks
- Deep learning vs tabular models (Ep. 217), by Data Science at Home
Youtube
- Prof. CHRIS SUMMERFIELD - Natural General Intelligence [SPECIAL EDITION], by Machine Learning Street Talk
- The AI Buzz, Episode #4: ChatGPT + Bing and How to start an AI company in 3 easy steps., by StatQuest
Blogs
- Planning for AGI and beyond, by Open AI
- Optimizing AI/ML workloads for sustainability, by Amazon Science
- Recent honors and awards for Amazon scientists, by Amazon Science
- Ten teams advance in the Alexa Prize SimBot Challenge, by Amazon Science
Reddit’s top posts
- PyGWalker: Turn your Pandas Dataframe into a Tableau-style UI for Visual Analysis, at r/Data Science (💬40)
- Laptop recommendations for data analytics in University., at r/Data Science (💬220)
- Why is the field called Data Science and not Computational Statistics?, at r/Data Science (💬213)
- Meta AI open sources new SOTA LLM called LLaMA. 65B version (trained on 1.4T tokens) is competitive with Chinchilla and Palm-540B. 13B version outperforms OPT and GPT-3 175B on most benchmarks., at r/Machine Learning (💬180)
- Composer, a large (5 billion parameters) controllable diffusion model trained on billions of (text, image) pairs, comparable to SD + controlnet, at r/Machine Learning (💬15)
- “MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation” enables controllable image generation without any further training or finetuning of diffusion models., at r/Machine Learning (💬14)
- Best numeric method to find alpha that maximizes the likelihood function (known n, xi’s), at r/Ask Statistics (💬9)
- what test do I use? I’m stuck in a hole, help me 🙈😂, at r/Ask Statistics (💬6)
- Any recommendations for a good textbook on bayesian networks?, at r/Ask Statistics (💬3)
- AI Learns to Walk, Hop, and Roll, at r/Latest in ML (💬0)
Github jupyter notebook trends
- stable-diffusion: A latent text-to-image diffusion model
- CLIP: CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
- Prompt-Engineering-Guide: 🐙Guides, papers, lecture, and resources for prompt engineering
- stable-diffusion-webui-colab: stable diffusion webui colab
- ChatGPT_Sports_Betting_Bot: This is the code for “I Built a Sports Betting Bot with ChatGPT” by Siraj Raval on Youtube
- BLIP: PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
- pytorch-deep-learning: Materials for the Learn PyTorch for Deep Learning: Zero to Mastery course.
- taming-transformers: Taming Transformers for High-Resolution Image Synthesis
- fastbook: The fastai book, published as Jupyter Notebooks
- models: Models and examples built with TensorFlow
- amazon-sagemaker-examples: Example📓Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using🧠Amazon SageMaker.
- lora: Using Low-rank adaptation to quickly fine-tune diffusion models.
- automl: Google Brain AutoML
- examples: TensorFlow examples
- Stock-Prediction-Models: Gathers machine learning and deep learning models for Stock forecasting including trading bots and simulations
- numerical-linear-algebra: Free online textbook of Jupyter notebooks for fast.ai Computational Linear Algebra course
- google-research: Google Research
- DL-Algorithms: None
- dsp: 𝗗𝗦𝗣: Demonstrate-Search-Predict. A framework for composing retrieval and language models for knowledge-intensive NLP.
- DeepLearningExamples: State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
- yolov7: Implementation of paper - YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
Github python trends
- ControlNet: Let us control diffusion models!
- mm-cot: Official implementation for “Multimodal Chain-of-Thought Reasoning in Language Models” (stay tuned and more will be updated)
- stable-diffusion-webui: Stable Diffusion web UI
- sd-webui-controlnet: WebUI extension for ControlNet
- youtube-dl: Command-line program to download videos from YouTube.com and other video sites
- ColossalAI: Making large AI models cheaper, faster and more accessible
- picoGPT: An unnecessarily tiny implementation of GPT-2 in NumPy.
- mario-gpt: Generating Mario Levels with GPT2. Code for the paper “MarioGPT: Open-Ended Text2Level Generation through Large Language Models”https://arxiv.org/abs/2302.05981
- sd_civitai_extension: All of the Civitai models inside Automatic 1111 Stable Diffusion Web UI
- stablediffusion: High-Resolution Image Synthesis with Latent Diffusion Models
- GFPGAN: GFPGAN aims at developing Practical Algorithms for Real-world Face Restoration.
- ComfyUI: A powerful and modular stable diffusion GUI with a graph/nodes interface.
- text-generation-webui: A gradio web UI for running Large Language Models like GPT-J 6B, OPT, GALACTICA, GPT-Neo, and Pygmalion.
- deforum-for-automatic1111-webui: Deforum extension script for AUTOMATIC1111’s Stable Diffusion webui
- LMOps: General technology for enabling AI capabilities w/ LLMs and MLLMs
- researchgpt: An open-source LLM based research assistant that allows you to have a conversation with a research paper
- diffusers: 🤗Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch
- DeepSpeed: DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
- awesome-python: A curated list of awesome Python frameworks, libraries, software and resources
Don't miss what's next. Subscribe to 5 minutes of Data Science: