5 minutes of Data Science

Subscribe
Archives
October 11, 2021

Free ebook on Introduction to Probability for Data Science, how to train BERT for Q&A

🗯 This week

  • As I've mentioned last week, I've been working on extracting stats from my jobs' feed at Upwork.com. The goal is to understand which are the most sought-out data science and data engineer skills. I'm building an ETL pipeline for this. If this is something that you'd be interested - wether it's for Data Science or not - ping me on twitter.
  • The reason why math and programming go hand-in-hand: math_and_coding
  • Remember to check the most popular Reddit posts this week on data-related boards. 👇

🔮 Data Science

  • Free ebook! "Introduction to Probability for Data Science" by Stanley Chan. Download here.
  • NLP: How to train BERT for Q&A. link
  • FooDI-ML: a new large-scale multi-language dataset that contains over 1.5M unique images and over 9.5M store names, product names descriptions, and collection sections. link

👋 See you next time

Let's keep in touch, Pedro.

website | twitter | medium | github | stackoverflow | linkedin


🔝 Most popular Reddit posts this week

r/DataScience

  • How many here are 'passionate' about data science? Can we stop with the 'passion' posts please? (665 upvotes; 220 replies)
  • Just recently turned in my two weeks notice as an analyst (487 upvotes; 95 replies)
  • I work maximum 3-4 hours everyday and feel guilty all the time. What can I do to not feel like this? (433 upvotes; 188 replies)
  • Advice Request: Getting/staying good at SQL if you don’t use it in current role (114 upvotes; 53 replies)
  • What’s the deal with Blind? (83 upvotes; 47 replies)

r/DataEngineering

  • SQL Cheatsheet - Basics (401 upvotes; 43 replies)
  • Please Critique my Resume: Data Analyst transitioning to Data Engineer (126 upvotes; 66 replies)
  • Does anyone know where to find jobs for companies doing good in the world? (105 upvotes; 68 replies)
  • Advice for improving at SQL (53 upvotes; 33 replies)
  • Will a Master’s degree be of any advantage in the career growth for a Data Engineer? (51 upvotes; 42 replies)

r/MachineLearning

  • [Project] Generating cool names for machine learning projects (302 upvotes; 14 replies)
  • [R] 6 Key Jobs in Data Industry (300 upvotes; 44 replies)
  • [D] Feeling overwhelmed because applying machine learning to real life problems is not trivial (279 upvotes; 98 replies)
  • [P] SpotML - Managed ML Training on cheap AWS/GCP Spot Instances (205 upvotes; 54 replies)
  • [R] ResNet strikes back: An improved training procedure in timm. There has been significant progress on best practices for training neural nets since ResNet's introduction in 2015. With such advances, a vanilla ResNet-50 reaches 80.4% top-1 accuracy on ImageNet without extra data or distillation. (198 upvotes; 46 replies)

r/LearnMachineLearning

  • Convolution Neural Networks Visualization using Unity 3D, C# and Python (749 upvotes; 19 replies)
  • I made an interactive neural network! Here's a video of it in action, but you can play with it at aegeorge42.github.io (531 upvotes; 41 replies)
  • LEGO Object Detection Dataset, Free For Anyone (294 upvotes; 12 replies)
  • 100+ Machine Learning and Deep Learning Cheat Sheets (215 upvotes; 0 replies)
  • minGPT: a small and educational implementation of GPT by Andrej Karpathy (137 upvotes; 21 replies)

r/AskStatistics

  • How was the T-Test and T-distribution discovered? Can someone explain this conceptually? ELI5 will be highly appreciated. (21 upvotes; 13 replies)
  • What's a good book for the layperson on a crash course in the fundamentals of statistics? (10 upvotes; 12 replies)
  • Assumptions violated: what now? (8 upvotes; 6 replies)
  • Likert scale to t-test (8 upvotes; 3 replies)
  • What is the appropriate measure of correlation to show these 2 users are the same person? (7 upvotes; 5 replies)

r/LatestInML

  • State of the art in Quadrupedal robots! (9 upvotes; 0 replies)

r/MLQuestions

  • How to move from SWE to MLE with no professional ML experience? (12 upvotes; 5 replies)
  • A book to read between "Hands on ML" and PRML? (6 upvotes; 6 replies)
  • Implement VAEs/AEs before GANs? (5 upvotes; 5 replies)
  • Does anyone have good resources for trying to capture semantic similarity? (7 upvotes; 2 replies)
  • Computer Vision to Autocrop Images (6 upvotes; 1 replies)
Don't miss what's next. Subscribe to 5 minutes of Data Science:
GitHub X LinkedIn
Powered by Buttondown, the easiest way to start and grow your newsletter.