5 minutes of Data Science

Subscribe
Archives
January 9, 2023

Week 1 of 2023

5 Minutes of Data Science - week 1

Highlights from January 02 to January 08

Foreword

A few newsletter feeds have been added, enjoy!

Come say hi on Mastodon. See you next week!


Blogs

  • Amazon’s papers at SLT, by Amazon Science
  • Computer vision for automated quality inspection, by Amazon Science
  • WACV: Where application-based research finds a home, by Amazon Science
  • More-efficient annotation for semantic segmentation in video, by Amazon Science

Newsletters

  • Last Week in AI #200: A Review of AI in 2022, by Last Week in AI
  • Import AI 313: Smarter robots via foundation models; Stanford trains a small best-in-class medical LM; Baidu builds a multilingual coding dataset, by Import AI

Podcasts

  • NLP research by & for local communities, by Practical AI
  • Service Cards and ML Governance with Michael Kearns - #610, by The TWIML AI
  • Data-Centric AI - Marysia Winkels, by Data Talks

Reddit’s top posts

  • Changing my feminine first name to a masculine nickname on my resume gave me way more responses per application, at r/Data Science (💬246)
  • Here’s another predatory unpaid internship that’s offering a promotion to a CTO title, at r/Data Science (💬59)
  • The most epic DS job title, at r/Data Science (💬45)
  • I built Adrenaline, a debugger that fixes errors and explains them with GPT-3, at r/Machine Learning (💬59)
  • Fixing the angle of Skewed Paintings, see comments, at r/Machine Learning (💬34)
  • Greg Yang’s work on a rigorous mathematical theory for neural networks, at r/Machine Learning (💬38)
  • Which statistical methods became obsolete in the last 10-20-30 years?, at r/Ask Statistics (💬27)
  • Does experiencing a highly-unlikely event effect the odds of experiencing it again?, at r/Ask Statistics (💬10)
  • Are hazards in Cox regression even meaningful? Why has Cox regression become the norm for time-to-event analysis., at r/Ask Statistics (💬4)
  • What happened in AI research in 2022 - My curated list of AI breakthroughs with a video explanation, article, and code for each paper, at r/Latest in ML (💬0)
  • What to do when hyperparameter tuning doesn’t improve model performance?, at r/Latest in ML (💬2)

Github jupyter notebook trends

  • nanoGPT: The simplest, fastest repository for training/finetuning medium-sized GPTs.
  • Open-Assistant: OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
  • data-engineering-zoomcamp: Free Data Engineering course!
  • fastbook: The fastai book, published as Jupyter Notebooks
  • stable-diffusion-webui-colab: stable diffusion webui colab
  • MachineLearningNotebooks: Python notebooks with ML and deep learning examples with Azure Machine Learning Python SDK | Microsoft
  • geospatial-data-catalogs: A list of open geospatial datasets available on AWS, Earth Engine, Planetary Computer, NASA CMR, and STAC Index
  • pyprobml: Python code for “Probabilistic Machine learning” book by Kevin Murphy
  • mlops-zoomcamp: Free MLOps course from DataTalks.Club
  • fastai: The fastai deep learning library
  • EconML: ALICE (Automated Learning and Intelligence for Causation and Economics) is a Microsoft Research project aimed at applying Artificial Intelligence concepts to economic decision making. One of its goals is to build a toolkit that combines state-of-the-art machine learning techniques with econometrics in order to bring automation to complex causal …
  • VToonify: [SIGGRAPH Asia 2022] VToonify: Controllable High-Resolution Portrait Video Style Transfer
  • coursera-deep-learning-specialization: Notes, programming assignments and quizzes from all courses within the Coursera Deep Learning specialization offered by deeplearning.ai: (i) Neural Networks and Deep Learning; (ii) Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization; (iii) Structuring Machine Learning Projects; (iv) Convolutional Neural Network…
  • practical-statistics-for-data-scientists: Code repository for O’Reilly book
  • deep-learning-with-python-notebooks: Jupyter notebooks for the code samples of the book “Deep Learning with Python”
  • yolov7: Implementation of paper - YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
  • diff-svc: Singing Voice Conversion via diffusion model
  • handson-ml3: A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.

Github python trends

  • minGPT: A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training
  • ColossalAI: Colossal-AI: A Unified Deep Learning System for Big Model Era
  • awesome-python: A curated list of awesome Python frameworks, libraries, software and resources
  • gpt_index: An index created by GPT to organize external information and answer queries!
  • pyright: Static type checker for Python
  • unilm: Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
  • petals: 🌸Run 100B+ language models at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
  • sqlglot: Python SQL Parser and Transpiler
  • stable-diffusion-webui: Stable Diffusion web UI
  • openai-cookbook: Examples and guides for using the OpenAI API
  • gallery-dl: Command-line program to download image galleries and collections from several image hosting sites
  • GFPGAN: GFPGAN aims at developing Practical Algorithms for Real-world Face Restoration.
  • CodeFormer: [NeurIPS 2022] Towards Robust Blind Face Restoration with Codebook Lookup Transformer
Don't miss what's next. Subscribe to 5 minutes of Data Science:
GitHub X LinkedIn
Powered by Buttondown, the easiest way to start and grow your newsletter.