ETL pipelines, what makes a data science project successful, Data Science for beginners by Microsoft
🗯 This week
- I’ve been wrapping up the series on “Building an ETL pipeline from scratch.” It’s a great opportunity to get started if you’re not used to building pipelines. The bonus is that it uses the newest version of Airflow. I hope to finish the blog post in a week or two.
- After a week of posting tweets on stats regarding successful data science projects, I figured the best is to compile in a (future) blogpost. After seeing so many failed projects, I found it interesting to understand how to tackle common data science problems. If you’re curious, I started tweting about it around here. Here are some example stats:
- Remember to check the most popular Reddit posts this week on data-related boards. 👇
🔮 Data Science
- Data Science for beginners by Microsoft
👋 See you next time
Let’s keep in touch, Pedro.
website | twitter | medium | github | stackoverflow | linkedin
🔝 Most popular Reddit posts this week
r/DataScience
- Data Science is 80% fighting with IT, 19% cleaning data and 1% of all the cool and sexy crap you hear about the field. Agree? (⬆️ 1115 ; 💬 183)
- Where do Data Scientists go camping? (⬆️ 598 ; 💬 43)
- I was hired as a data analyst 4 months ago by an AI company and my boss is expecting me to create a reasoning system (as part of our attempt at KRR)– I feel extremely overwhelmed and am convinced I’ll be fired for underperforming (⬆️ 318 ; 💬 98)
- 80/20 rule: models that account for maybe 20% of your toolkit but solve 80% of your practical problems? (⬆️ 277 ; 💬 103)
- What would you do if the upper management wants you to work with 30 excel files that are being used as database? (⬆️ 264 ; 💬 101)
r/DataEngineering
- Let’s show some appreciation to data engineers (⬆️ 508 ; 💬 5)
- I deleted data from production (⬆️ 154 ; 💬 25)
- How do you test your pipelines? (⬆️ 86 ; 💬 24)
- Is our coding challenge too hard? (⬆️ 85 ; 💬 120)
- We wrote about how Postman’s data team operates! (⬆️ 67 ; 💬 11)
r/MachineLearning
- [P] StyleGAN3 + Cosplay Dataset. Happy Halloween! 🎃 (⬆️ 784 ; 💬 20)
- [D] New in-depth AI interview episode out! Yuval was featured on 2 minute papers for his incredible work on AI toonification. (⬆️ 253 ; 💬 3)
- [D] How can companies like Facebook use Pytorch for commercial applications when BN and dropout are patented? (⬆️ 232 ; 💬 106)
- 100Circles - Words to Paintings via NightCafe VQGAN+CLIP [Project] (⬆️ 254 ; 💬 16)
- [D] What is a reasonable way to address a paper that was published and you consider to be dishonest or plain bogus? (⬆️ 217 ; 💬 60)
r/LearnMachineLearning
- Should have read binary classifier, but ok… (⬆️ 982 ; 💬 21)
- We Built IntelliBrush - An AI Labeller Using Neural Networks and CV (⬆️ 640 ; 💬 19)
- How to read more research papers? (tips & tools given) (⬆️ 299 ; 💬 13)
- These plants do not exist (⬆️ 155 ; 💬 5)
- Tired of university, i need help on how to learn AI and ML by myself (⬆️ 114 ; 💬 44)
r/AskStatistics
- Can a Statistician using only R get a DS job not having a strong CS background? (⬆️ 22 ; 💬 10)
- What are the differences between linear models and linear regression? (⬆️ 12 ; 💬 10)
- I am trying to get a random number based on the normal distribution (⬆️ 11 ; 💬 11)
- Alternatives to Poisson distribution. (⬆️ 10 ; 💬 19)
- Resource recommendation to relearn statistics (⬆️ 11 ; 💬 4)
r/LatestInML
- How to read more research papers? (tips & tools given) (⬆️ 26 ; 💬 6)
- Straight out of science fiction! Drones that can track and 3D reconstruct any person also while avoiding obstacles! (pose estimation) (⬆️ 21 ; 💬 1)
- ADOP: Approximate Differentiable One-Pixel Point Rendering (Synthesize Smooth Videos from a Couple of Images) (⬆️ 14 ; 💬 2)
- Multitask Prompted Training Enables Zero-shot Task Generalization (Explained) (⬆️ 7 ; 💬 0)
- [D] State of the art in the document information extraction/parsing for resume parsing? (⬆️ 7 ; 💬 0)
r/MLQuestions
- Early coding habits to pick up (⬆️ 21 ; 💬 7)
- What is the point of pseudo-labeling for a semi-supervised learning task? (⬆️ 12 ; 💬 0)
- Graduate Studies in Machine Learning (⬆️ 9 ; 💬 3)
- ML Algorithm Suggestions (⬆️ 8 ; 💬 5)
- Python practical time series materials (⬆️ 7 ; 💬 1)
Don't miss what's next. Subscribe to 5 minutes of Data Science: