Week 46
5 Minutes of Data Science - week 46
Highlights from November 14 to November 20
Foreword
Hi folks!
Here's to another week that went by. I'm still working on bringing more feeds to the newsletter, but in the mean time, here's what was relevant last week. Enjoy!
PS: Come say hi on mastodon!
Blogs
- Best practices for data enrichment, by DeepMind
- Conversation Summaries in Google Chat, by Google AI
- The Data Cards Playbook: A Toolkit for Transparency in Dataset Documentation, by Google AI
- Mixture-of-Experts with Expert Choice Routing, by Google AI
- How Prime Video distills time series anomalies into actionable alarms, by Amazon Science
- Amazon and UCLA announce fellowship recipients, by Amazon Science
- Workshops on trustworthy NLP help build community, by Amazon Science
- Amazon Books editors announce year's best science books, by Amazon Science
- How Amazon integrated Alexa into NASA’s Orion spacecraft, by Amazon Science
- Amazon-UCLA model wins coreference resolution challenge, by Amazon Science
- How Amazon Robotics researchers are solving a “beautiful problem”, by Amazon Science
- Continuous Soft Pseudo-Labeling in ASR, by Apple Machine Learning
Podcasts
- StrategyQA and Big Bench, by Data Skeptic
- Protecting us with the Database of Evil, by Practical AI
- From Digital Marketing to Analytics Engineering - Nikola Maksimovic, by Data Talks
Youtube
- JULIAN TOGELIUS, Prof. KEN STANLEY - AGI, Games, Diversity & Creativity [UNPLUGGED], by Machine Learning Street Talk
- AIDAN GOMEZ [CEO Cohere] - Language as Software, by Machine Learning Street Talk
- Design Matrix Examples in R, Clearly Explained!!!, by StatQuest
- Design Matrices For Linear Models, Clearly Explained!!!, by StatQuest
- Using Linear Models for t tests and ANOVA, Clearly Explained!!!, by StatQuest
- Multiple Regression in R, Step by Step!!!, by StatQuest
- Multiple Regression, Clearly Explained!!!, by StatQuest
- Linear Regression in R, Step by Step, by StatQuest
- Linear Regression, Clearly Explained!!!, by StatQuest
- R-squared, Clearly Explained!!!, by StatQuest
- But what is a convolution?, by 3Blue1Brown
- Does anyone feel like R is actually vastly worse for dependency/environment management than Python?, at r/Data Science (💬190)
- Overworked, at r/Data Science (💬127)
- Is it illegal to web-scrape interest rates from banks? What if I am trying to understand historical pricing of investment/insurance, at r/Data Science (💬82)
- new SNAPCHAT feature transfers an image of an upper body garment in realtime on a person in AR, at r/Machine Learning (💬44)
- my PhD advisor "machine learning researchers are like children, always re-discovering things that are already known and make a big deal out of it.", at r/Machine Learning (💬191)
- Sim2Real multi-finger robot hand manipulation using point cloud RL, at r/Machine Learning (💬10)
- Could I break into stats/data science positions with a bachelor’s degree in quantitative economics? [more details in text], at r/Ask Statistics (💬6)
- Misuse of the CLT, at r/Ask Statistics (💬5)
- Comparing model predictions to a dumb model that just predicts the mean: is this basically R2?, at r/Ask Statistics (💬5)
- Fly Into Your Pictures With AI! InfiniteNature-Zero, at r/Latest in ML (💬1)
Github jupyter notebook trends
- whisper: Robust Speech Recognition via Large-Scale Weak Supervision
- dpm-solver: Official code for "DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps" (Neurips 2022 Oral)
- annotated_deep_learning_paper_implementations: 🧑🏫59 Implementations/tutorials of deep learning papers with side-by-side notes📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, ...), gans(cyclegan, stylegan2, ...),🎮reinforcement learning (ppo, dqn), capsnet, distillation, ...🧠
- PythonDataScienceHandbook: Python Data Science Handbook: full text in Jupyter Notebooks
- pytorch-Deep-Learning: Deep Learning (with PyTorch)
- nerf: Code release for NeRF (Neural Radiance Fields)
- Financial-Models-Numerical-Methods: Collection of notebooks about quantitative finance, with interactive python code.
- pandas_exercises: Practice your pandas skills!
- pytorch-deep-learning: Materials for the Learn PyTorch for Deep Learning: Zero to Mastery course.
- UnstableFusion: A Stable Diffusion desktop frontend with inpainting, img2img and more!
- Kalman-and-Bayesian-Filters-in-Python: Kalman Filter book using Jupyter Notebook. Focuses on building intuition and experience, not formal proofs. Includes Kalman filters,extended Kalman filters, unscented Kalman filters, particle filters, and more. All exercises include solutions.
- yolov7: Implementation of paper - YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
- py: Repository to store sample python programs for python learning
- lama: 🦙LaMa Image Inpainting, Resolution-robust Large Mask Inpainting with Fourier Convolutions, WACV 2022
- pycaret: An open-source, low-code machine learning library in Python
- deep-rl-class: This repo contain the syllabus of the Hugging Face Deep Reinforcement Learning Class.
- zero-to-mastery-ml: All course materials for the Zero to Mastery Machine Learning and Data Science course.
- handson-ml3: A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.
- numerical-linear-algebra: Free online textbook of Jupyter notebooks for fast.ai Computational Linear Algebra course
Github python trends
- latexify_py: A library to generate LaTeX expression from Python code.
- sktime: A unified framework for machine learning with time series
- d2l-en: Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 400 universities from 60 countries including Stanford, MIT, Harvard, and Cambridge.
- airflow: Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
- langchain: ⚡Building applications with LLMs through composability⚡
- yolov5: YOLOv5🚀in PyTorch > ONNX > CoreML > TFLite
- onnx: Open standard for machine learning interoperability
- twint: An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to scrape a user's followers, following, Tweets and more while evading most API limitations.
- Python: All Algorithms implemented in Python
Don't miss what's next. Subscribe to 5 minutes of Data Science: