Napkin Math #18: Neural Net from Scratch
Happy new year everyone!
In this most recent post, we will establish a mental model for how a neural network works by building one from scratch! In a future issue we will do napkin math on performance, as establishing the first-principle understanding is plenty of ground to cover for today.
Neural nets are increasingly dominating the field of machine learning / artificial intelligence: the most sophisticated models for computer vision (e.g. CLIP), natural language processing (e.g. GPT-3), translation (e.g. Google Translate), and more are based on neural nets. When these artificial neural nets reach some arbitrary threshold of neurons, we call it deep learning.
A visceral example of Deep Learning’s unreasonable effectiveness comes from an interview with Jeff Dean who leads AI at Google. He explains how 500 lines of Tensorflow outperformed the previous ~500,000 lines of code for Google Translate’s extremely complicated model. Blew my mind.
As a software developer with a predominantly web-related skillset of Ruby, databases, enough distributed systems knowledge to know to not get fancy, a bit of hard-earned systems knowledge from debugging incidents, but only high school level math: neural networks mystify me. How do they work? Why are they so good? Why are they so slow? Why are GPUs/TPUs used to speed them up? Why do the biggest models have more neurons than humans, yet still perform worse than the human brain?
Let’s find out… read the full article on the website!
If your organization needs help with some napkin math or are battling some infrastructure shenanigans, I am now available for consulting. Just reply to this email.
P.S. If you previously subscribed to the original 'Sirupsen Newsletter', the two newsletters have now been merged.
P.P.S. I read the emails that come in if you reply to this email with your questions, comments, or feedback!
P.P.P.S. The site may have a new design since you last saw it. I’ve rewritten it in Next.js to learn that. Hopefully the reading experience is nicer—that's partially why I no longer inline the articles in the newsletter.