Week 35 review
5 Minutes of Data Science - week 35
Highlights from August 29 to September 04
Foreword
Welcome to the new format of this newsletter. Now, it includes blog posts from the research teams at companies like OpenAI, DeepMind, Google and Amazon, the latest podcast and youtube episodes, the trending GitHub repositories related to Data Science and Machine Learning and the latest from the communities on Reddit!
This is highly experimental, but I hope you enjoy it. See you next week!
Blogs
- From motor control to embodied intelligence, by DeepMind
- Announcing the Patent Phrase Similarity Dataset, by Google AI
- DALL·E: Introducing Outpainting, by Open AI
- Model assesses the validity of tips offered in product reviews, by Amazon Science
- Janus framework lifts continual learning to the next level, by Amazon Science
- “I always knew that my main interest was in supply chain optimization”, by Amazon Science
Podcasts
- Fraudulent Amazon Reviewers, by Data Skeptic
- Privacy in the age of AI, by Practical AI
- Multimodal, Multi-Lingual NLP at Hugging Face with John Bohannon and Douwe Kiela - #589, by The TWIML AI
- Announcing Data Literacy Month, by DataFramed
Youtube
- Three more lessons from my Pop!!!, by StatQuest
- Based on my nightmares, at r/Data Science (💬161)
- What was the most inspiring/interesting use of data science in a company you have worked at? It doesn’t have to save lives or generate billions (it’s certainly a plus if it does) but its mere existence made you say “HOT DAMN!” And could you maybe describe briefly its model?, at r/Data Science (💬157)
- WhatsApp chat analysis between me and a friend, at r/Data Science (💬75)
- [P] Apple pencil with the power of Local Stable Diffusion using Gradio Web UI running off a 3090, at r/Machine Learning (💬39)
- US Gov imposes export requirements on NVIDIA A100s and future H100s to China and Russia, at r/Machine Learning (💬191)
- [D] Senior research scientist at GoogleAI, Negar Rostamzadeh: “Can’t believe Stable Diffusion is out there for public use and that’s considered as ‘ok’!!!”, at r/Machine Learning (💬376)
- When to use linear regression?, at r/Ask Statistics (💬25)
- Teaching myself some statistics, shouldn’t question b be 97.5% to account for results more than 3 standard deviations above the mean?, at r/Ask Statistics (💬19)
- What to expect from a PhD in Statistics?, at r/Ask Statistics (💬4)
- Panoptic scene graph generation (PSG) Explained - A New Challenging Task for AI, at r/Latest in ML (💬1)
- Personalizing Text-to-Image Generation using Textual Inversion, at r/Latest in ML (💬2)
- A list of research papers and open source tools in Data centric AI, at r/Latest in ML (💬0)
Github jupyter notebook trends
- CompVis/stable-diffusion: (5,431 stars this week)
- microsoft/ML-For-Beginners: 12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all (974 stars this week)
- rinongal/textual_inversion: (255 stars this week)
- Stability-AI/stability-sdk: SDK for interacting with stability.ai APIs (e.g. stable diffusion inference) (77 stars this week)
- CompVis/latent-diffusion: High-Resolution Image Synthesis with Latent Diffusion Models (328 stars this week)
- CompVis/taming-transformers: Taming Transformers for High-Resolution Image Synthesis (33 stars this week)
- microsoft/Data-Science-For-Beginners: 10 Weeks, 20 Lessons, Data Science for All! (106 stars this week)
- alembics/disco-diffusion: (260 stars this week)
- dataquestio/project-walkthroughs: (17 stars this week)
- Pierian-Data/Complete-Python-3-Bootcamp: Course Files for Complete Python 3 Bootcamp Course on Udemy (112 stars this week)
- karpathy/micrograd: A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API (75 stars this week)
- wesm/pydata-book: Materials and IPython notebooks for “Python for Data Analysis” by Wes McKinney, published by O’Reilly Media (84 stars this week)
- openai/CLIP: Contrastive Language-Image Pretraining (108 stars this week)
- DataTalksClub/mlops-zoomcamp: Free MLOps course from DataTalks.Club (55 stars this week)
- fivethirtyeight/data: Data and code behind the articles and graphics at FiveThirtyEight (16 stars this week)
- Harmonai-org/sample-generator: Tools to train a generative model on arbitrary audio samples (30 stars this week)
- GokuMohandas/Made-With-ML: Learn how to responsibly deliver value with ML. (64 stars this week)
- mli/transformers-benchmarks: (37 stars this week)
- WongKinYiu/yolov7: Implementation of paper - YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors (280 stars this week)
- fchollet/deep-learning-with-python-notebooks: Jupyter notebooks for the code samples of the book “Deep Learning with Python” (37 stars this week)
- bukosabino/ta: Technical Analysis Library using Pandas and Numpy (27 stars this week)
- cocodataset/cocoapi: COCO API - Dataset @ (18 stars this week)
- MorvanZhou/PyTorch-Tutorial: Build your neural network easy and fast, 莫烦Python中文教学 (22 stars this week)
- ageron/handson-ml3: A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2. (18 stars this week)
Github python trends
- hlky/stable-diffusion-webui: Stable Diffusion web UI (1,155 stars this week)
- huggingface/diffusers: (1,002 stars this week)
- xinntao/Real-ESRGAN: Real-ESRGAN aims at developing Practical Algorithms for General Image/Video Restoration. (387 stars this week)
- python-poetry/poetry: Python dependency management and packaging made easy. (420 stars this week)
- crowsonkb/k-diffusion: Karras et al. (2022) diffusion models for PyTorch (64 stars this week)
- tiangolo/sqlmodel: SQL databases in Python, designed for simplicity, compatibility, and robustness. (489 stars this week)
- microsoft/unilm: Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities (153 stars this week)
- iperov/DeepFaceLab: DeepFaceLab is the leading software for creating deepfakes. (387 stars this week)
- commaai/openpilot: openpilot is an open source driver assistance system. openpilot performs the functions of Automated Lane Centering and Adaptive Cruise Control for over 200 supported car makes and models. (97 stars this week)
- gradio-app/gradio: Create UIs for your machine learning model in Python in 3 minutes (302 stars this week)
- pytorch/torchdynamo: A Python-level JIT compiler designed to make unmodified PyTorch programs faster. (38 stars this week)
- 521xueweihan/HelloGitHub: (309 stars this week)
Don't miss what's next. Subscribe to 5 minutes of Data Science: