5 minutes of Data Science

Subscribe
Archives
August 9, 2021

Download a popular Data Science book, a tool to visualize Github repos, an SQL cheat sheet, ...

🗯 Featured post

This week’s blog post is a showcase of how Airflow 2.0 is a game-changer. The goal is to build an ETL pipeline and slowly build up.



A simple DAG using Airflow 2.0 | Pedro Madruga

Airflow 2.x is a game-changer, especially regarding its simplified syntax using the new Taskflow API. In this tutorial, we’re building a DAG with only two tasks. The DAG’s tasks include generating a random number (task 1) and print that number (task 2).

🔮 Data Science

  • 2nd version of “An Introduction to Statistical Learning” - One of the most famous Data Science books now has a 2nd version - which includes Deep Learning. The download is free.
  • Visualizing a codebase - Github launched a tool that makes visualizing a codebase easy. It allows for a glance at a repository structure. You can visualize any repo here.
  • SQL Cheat Sheet - a reminder of some common SQL commands
  • Series: Take your SQL from good to great - Do you know what a CTE is? A 5-part series on taking SQL to a new level.
  • Train/test split - Kevin Markham (@justmarkham) shares a tip for handling train/test split when there’s a class imbalance.
  • Build a Dash app with Python in 7 minutes - “Create a beautiful visualization app from scratch with Python”.

🛠 Data Engineering

  • The 2021 Data Engineer toolbox - All the tools and techniques in one chart.
  • Incident Detection and Alerting for Your Data Pipelines - Preventing broken data pipelines.

🧠 Misc

  • 20 common interview questions
  • The most unbelievable things about life before smartphones - How life was before smart phones.

👋 See you next time

Let’s keep in touch!

Pedro

website | twitter | medium | github | stackoverflow | linkedin

Don't miss what's next. Subscribe to 5 minutes of Data Science:
GitHub X LinkedIn
Powered by Buttondown, the easiest way to start and grow your newsletter.