Week 6 of 2023
5 Minutes of Data Science - week 6
Highlights from February 06 to February 12
Foreword
Come say hi on Mastodon. See you next week!
Newsletters
- February 2023, by Responsible AI
- Let’s speed up our Machine Learning Training!, by The AI Edge
- 🥇Top ML Papers of the Week, by NLP news
- Unveiling the Future of AI: Insights on AI Chips, Knowledge Graphs, and AI Regulations, by Gradient Flow
Reddit’s top posts
- Thoughts?, at r/Data Science (💬188)
- you’re an angel!!, at r/Data Science (💬24)
- Calling all NLP gurus, Meta is paying top dollar 😂, at r/Data Science (💬72)
- I’m using Instruct GPT to show anti-clickbait summaries on youtube videos, at r/Machine Learning (💬216)
- Toolformer: Language Models Can Teach Themselves to Use Tools - paper by Meta AI Research, at r/Machine Learning (💬60)
- Introducing arxivGPT: chrome extension that summarizes arxived research papers using chatGPT, at r/Machine Learning (💬65)
- Is a bachelors in statistics as useless as people make it sound?, at r/Ask Statistics (💬38)
- Is this too small a sample size to be indicative (link to study in comments)?, at r/Ask Statistics (💬72)
- Chi squared vs Kolmogorov–Smirnov vs Anderson-Darling?, at r/Ask Statistics (💬3)
- Best scikit-learn Cheat sheet for Important Data Scientist interview, at r/Latest in ML (💬0)
- ChatGPT Detection Chrome Extension - Detecting How Content Was Written - By Human or AI, at r/Latest in ML (💬0)
Github jupyter notebook trends
- whisper: Robust Speech Recognition via Large-Scale Weak Supervision
- stable-diffusion-webui-colab: stable diffusion webui colab
- stable-diffusion: A latent text-to-image diffusion model
- machine-learning-for-trading: Code for Machine Learning for Algorithmic Trading, 2nd edition.
- Probabilistic-Programming-and-Bayesian-Methods-for-Hackers: aka “Bayesian Methods for Hackers”: An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)
- latent-diffusion: High-Resolution Image Synthesis with Latent Diffusion Models
- openai-cookbook: Examples and guides for using the OpenAI API
- CLIP: Contrastive Language-Image Pretraining
- disco-diffusion: None
- stable-diffusion: Latent Text-to-Image Diffusion
- google-research: Google Research
- fastbook: The fastai book, published as Jupyter Notebooks
- SQL-Data-Analysis-and-Visualization-Projects: SQL data analysis & visualization projects using MySQL, PostgreSQL, SQLite, Tableau, Apache Spark and pySpark.
- lora: Using Low-rank adaptation to quickly fine-tune diffusion models.
- handson-ml3: A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.
- notebooks: Jupyter notebooks for the Natural Language Processing with Transformers book
- BLIP: PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
- Kalman-and-Bayesian-Filters-in-Python: Kalman Filter book using Jupyter Notebook. Focuses on building intuition and experience, not formal proofs. Includes Kalman filters,extended Kalman filters, unscented Kalman filters, particle filters, and more. All exercises include solutions.
- ChatGPT_Sports_Betting_Bot: This is the code for “I Built a Sports Betting Bot with ChatGPT” by Siraj Raval on Youtube
- InvokeAI: InvokeAI is a leading creative engine for Stable Diffusion models, empowering professionals, artists, and enthusiasts to generate and create visual media using the latest AI-driven technologies. The solution offers an industry leading WebUI, supports terminal use through a CLI, and serves as the foundation for multiple commercial products.
Github python trends
- ChatGPT: Reverse engineered ChatGPT API
- DocsGPT: GPT-powered chat for documentation search & assistance.
- BioGPT: None
- chatGPT-discord-bot: Integrate ChatGPT into your own discord bot
- Python: All Algorithms implemented in Python
- openai-python: The OpenAI Python library provides convenient access to the OpenAI API from applications written in the Python language.
- MockingBird: 🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
- LAVIS: LAVIS - A One-stop Library for Language-Vision Intelligence
- gpt-2: Code for the paper “Language Models are Unsupervised Multitask Learners”
- PaddleSpeech: Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
- haystack: 🔍Haystack is an open source NLP framework to interact with your data using Transformer models and LLMs (GPT-3 and alike). Haystack offers production-ready tools to quickly build ChatGPT-like question answering, semantic search, text generation, and more.
- chatgpt-on-wechat: 使用ChatGPT搭建微信聊天机器人,基于OpenAI API和itchat实现。Wechat robot based on ChatGPT, which using OpenAI api and itchat library.
- jax: Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
- Open-Assistant: OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
- PyChatGPT: ⚡️Python client for the unofficial ChatGPT API with auto token regeneration, conversation tracking, proxy support and more.
- chatgpt-wrapper: API for interacting with ChatGPT using Python and from Shell.
- ChatRWKV: ChatRWKV is like ChatGPT but powered by RWKV (100% RNN) language model, and open source.
- stable-diffusion-webui: Stable Diffusion web UI
- CodeGeeX: CodeGeeX: An Open Multilingual Code Generation Model
- RWKV-LM: RWKV is a RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it’s combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, “infinite” ctx_len, and free sentence embedding.
Podcasts
- MLOps is alive and well, by Practical AI
- The Journey of a Data Generalist: From Bioinformatics to Freelancing - Jekaterina Kokatjuhha, by Data Talks
Youtube
- Prof. Edward Grefenstette - Language, Semantics, Philosophy, by Machine Learning Street Talk
- Prof. MICHAEL LEVIN, Prof. IRINA RISH - Emergence, Intelligence, Transhumanism, by Machine Learning Street Talk
- Dr. PATRICK LEWIS - Retrieval Augmented Generation, by Machine Learning Street Talk
- The AI Buzz, Episode #3: Constitutional AI, Emergent Abilities and Foundation Models, by StatQuest
Blogs
- Amazon and Howard University announce academic collaboration, by Amazon Science
- Pai-Ling Yin brings an academic’s lens to the study of buying and selling at Amazon, by Amazon Science
- On a mission to demystify artificial intelligence, by Amazon Science
Don't miss what's next. Subscribe to 5 minutes of Data Science: