Week 44
5 Minutes of Data Science - week 44
Highlights from October 31 to November 06
Foreword
Hi folks 👋🏻
This week’s release is on a Wednesday instead of Monday due to lack of internet connection on the Raspberry Pi. The same Raspberry Pi that runs Airflow and orchestrates the scripts that fetch the data from different sources.
Come say hi on Twitter and see you next Monday!
Blogs
- DALL·E API Now Available in Public Beta, by Open AI
- Jens Lehmann receives Semantic Web journal 10-year award, by Amazon Science
- Building systems that automatically adjust to workloads and data, by Amazon Science
- Amazon and University of Washington announce Science Hub fellows, by Amazon Science
- Method enables better control of GAN image generators’ output, by Amazon Science
- Two Amazon Scholars named among inaugural Rousseeuw Prize winners, by Amazon Science
- MAEEG: Masked Auto-encoder for EEG Representation Learning, by Apple Machine Learning
Podcasts
- Is studying AI in academia a waste of time? (Ep. 208), by Data Science At Home
- The practicalities of releasing models, by Practical AI
- Building Data Science Practice - Andrey Shtylenko, by Data Talks
Youtube
- The Cosine Similarity for NLP and CatBoost, by StatQuest
- Researchers thought this was a bug (Borwein integrals), by 3Blue1Brown
- Add it to the training set, Walmart, at r/Data Science (💬54)
- hot take: forget data science, we need more analysts, at r/Data Science (💬176)
- Data Science Hierarchy of Needs … as relevant as ever, at r/Data Science (💬47)
- Finetuned Diffusion: multiple fine-tuned Stable Diffusion models, trained on different styles, at r/Machine Learning (💬60)
- Transcribe any podcast episode in just 1 minute with optimized OpenAI/whisper, at r/Machine Learning (💬23)
- DALL·E to be made available as API, OpenAI to give users full ownership rights to generated images, at r/Machine Learning (💬58)
- Asked by bosses (non-statisticians) to aggregate data after we didn’t get the answer they wanted, at r/Ask Statistics (💬21)
- Can you help with this exercise of multivariate statistics? I really can’t get anything out of it., at r/Ask Statistics (💬8)
- Interaction term is significant while main effect isn’t, at r/Ask Statistics (💬18)
- Condensing datasets using dataset distillation, at r/Latest in ML (💬0)
- eDiffi: Higher Quality and Fidelity than Stable Diffusion! (explained), at r/Latest in ML (💬0)
Github jupyter notebook trends
- Dreambooth-Stable-Diffusion: Implementation of Dreambooth (https://arxiv.org/abs/2208.12242) with Stable Diffusion
- annotated_deep_learning_paper_implementations: 🧑🏫59 Implementations/tutorials of deep learning papers with side-by-side notes📝; including transformers (original, xl, switch, feedback, vit, …), optimizers (adam, adabelief, …), gans(cyclegan, stylegan2, …),🎮reinforcement learning (ppo, dqn), capsnet, distillation, …🧠
- whisper: Robust Speech Recognition via Large-Scale Weak Supervision
- google-research: Google Research
- Mubert-Text-to-Music: A simple notebook demonstrating prompt-based music generation via Mubert API
- micrograd: A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API
- Made-With-ML: Learn how to responsibly develop, deploy and maintain production machine learning applications.
- yolov7: Implementation of paper - YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
- machine-learning-interview-enlightener: This repo is meant to serve as a guide for Machine Learning/AI technical interviews.
- examples: TensorFlow examples
- mlops-course: A project-based course on the foundations of MLOps to responsibly develop, deploy and maintain ML.
- deeplearning-models: A collection of various deep learning architectures, models, and tips
- amazon-sagemaker-examples: Example📓Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using🧠Amazon SageMaker.
- ML-For-Beginners: 12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all
- data: Data and code behind the articles and graphics at FiveThirtyEight
- machine-learning-book: Code Repository for Machine Learning with PyTorch and Scikit-Learn
- shap: A game theoretic approach to explain the output of any machine learning model.
- industry-machine-learning: A curated list of applied machine learning and data science notebooks and libraries across different industries (by@firmai)
- azureml-examples: Official community-driven Azure Machine Learning examples, tested with GitHub Actions.
- handson-ml2: A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.
Github python trends
- tinygrad: You like pytorch? You like micrograd? You love tinygrad!❤️
- lama-cleaner: Image inpainting tool powered by SOTA AI Model. Remove any unwanted object, defect, people from your pictures or erase and replace(powered by stable diffusion) any thing on your pictures.
- openpilot: openpilot is an open source driver assistance system. openpilot performs the functions of Automated Lane Centering and Adaptive Cruise Control for over 200 supported car makes and models.
- fast-stable-diffusion: fast-stable-diffusion, +25-50% speed increase + memory efficient + DreamBooth
- keras: Deep Learning for humans
- alphafold: Open source code for AlphaFold.
- esm: Evolutionary Scale Modeling (esm): Pretrained language models for proteins
- devops-exercises: Linux, Jenkins, AWS, SRE, Prometheus, Docker, Python, Ansible, Git, Kubernetes, Terraform, OpenStack, SQL, NoSQL, Azure, GCP, DNS, Elastic, Network, Virtualization. DevOps Interview Questions
- yolov5: YOLOv5🚀in PyTorch > ONNX > CoreML > TFLite
- ivy: The Unified Machine Learning Framework
- Gymnasium: A standard API for reinforcement learning and a diverse set of reference environments (formerly Gym)
- marqo: Tensor search for humans.
- mmdetection: OpenMMLab Detection Toolbox and Benchmark
- DeepSpeed: DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
- Deep-Learning-Papers-Reading-Roadmap: Deep Learning papers reading roadmap for anyone who are eager to learn this amazing tech!
- statsmodels: Statsmodels: statistical modeling and econometrics in Python
Don't miss what's next. Subscribe to 5 minutes of Data Science: