Musing in Computer Systems
Subscribe
RSS
Archive
New blog post: Finding near-duplicates with Jaccard similarity and MinHash
July 3, 2024
New blog post: On Jaccard similarity and the MinHash trick I learned about this algorithm and hashing trick while reading about LLMs and GPT-3, and thought...
New blog post: Stripe's monorepo developer environment
May 21, 2024
New blog post! This one started as a Slack thread at work, outlining some salient decisions that Stripe's developer productivity team had made in building...
New blog post: Performance engineering, profilers, and seeing the invisible
December 18, 2023
New post published: Performance engineering, profilers, and seeing the invisible I intended for this to go out a week or two earlier, but then I got...
New blog post: Advent of Code in C++ Template Metaprogramming
December 9, 2023
Long story short: I wrote up a solution to Advent of Code day 1, using C++ template metaprogramming. Historically I haven't usually done Advent of Code, but...
New blog post: A note about ML and Pickle
November 8, 2023
New blog post: What's the deal with ML and pickle? Inspired by a friend asking me "What's the deal with ML software and using pickle for everything?," I...
New blog post: Graceful behavior at capacity
August 7, 2023
New blog post out: Graceful behavior at capacity This one got a little bit away from me, although I'm fairly happy with the result. It originated as a series...
New blog post: Efficiency trades off against resiliency
April 16, 2023
It's been a while! Updating the newsletter because I've published a new blog post, exploring the phenomenon that making a system more efficient often makes...
Blog post: A cursed bug
February 23, 2022
Hey folks, Just writing an update to let subscribers know of a new blog post. I posted a writeup of a delightfully cursed bug that we ran into and eventually...
Two reasons Kubernetes is so complex
January 27, 2022
Preface Hello friends! It’s been a while. I’ve been finding it very hard to write while holding up a full-time job, and I’ve also been dealing with some very...
Some thoughts on GitHub Copilot
July 12, 2021
A week or so ago, GitHub announced GitHub Copilot, their AI-powered code completion assistant, powered by a version of OpenAI’s GPT-3 model. I’ve spent a lot...
Blog post: Distributed cloud builds for everyone
June 1, 2021
Ranger update! He turned 6 months old about a week ago. Here he is celebrating Memorial Day yesterday with his very first slice of watermelon, which he...
Blog post: Building LLVM in 90 seconds using Lambda
May 21, 2021
I’ve been talking about my Llama project here for a while now. Last week, there were some blog posts about building LLVM quickly on large machines, so I...
Some more Llama profiling
April 28, 2021
I wrote previously about profiling llama, and the challenges of understanding this distributed system. A few notes today about some of my progress since...
Do the hard one second
April 14, 2021
Look at this perfect sleepy donut boy! Technical migrations This post is an excerpt from a work-in-progress post about running technical migrations. If all...
Profiling llama
March 31, 2021
Profiling llama I wrote a few months back about llama, my experimental project for executing shell commands in Amazon Lambda. I briefly previewed llamacc,...
New blog post: Opinionated thoughts on SQL
March 30, 2021
Short email, to let you know I have a new blog post out, sharing my thoughts on and pet peeves with SQL databases. I was going to send this as a newsletter...
Notes on some PostgreSQL implementation details
March 23, 2021
Ranger update Seriously friends he has gotten so large. He knows his name and comes when called, like, 80% of the time. He turned four months old today, just...
What does a cache do?
March 5, 2021
I’ve recently had cause to work on scaling up a web application. It’s got a pretty traditional architecture: A CDN in front of a fast native code web server...
HTTP Pipelining, S3, and gg
February 22, 2021
Ranger update 22lbs at last weigh-in! His favorite place continues to be "curled up at our feet chewing his favorite bone": S3 and pipelining A few weeks...
Tagged unions are overrated
February 15, 2021
Among engineers who have strong opinions about programming languages, one particularly widely-held take is in the value of tagged unions, as well as language...
On Reasoning about Code
February 12, 2021
Ranger update Ranger has now gotten enough vaccinations to go for walks! This is very exciting. He was 19.6lbs at last weighing, now crossing twice his...
Alive2 and missed-optimization bug reports
February 1, 2021
First, a Ranger update. He is a growing healthy boy, up 50% by weight over the last two weeks. This week, I want to do a quick writeup of something I tweeted...
Some notes on code review
January 24, 2021
A brief personal note I missed a newsletter last week, due to a combination of procrastination and MIT Mystery Hunt, but also due to adopting this little guy...
Tracing JITs and coverage-guided fuzzers
January 9, 2021
By happenstance, I am friends with a handful of engineers that happen to have spent substantial amounts of time working on both the PyPy project and on...
Situated Social Software
December 18, 2020
(Probably) no new letters for the rest of this year — going to take a holiday break. Thanks for subscribing and reading this newsletter, and you can find...
Notes on Amazon Lambda
December 11, 2020
First, a callback to an older post: Itamar Turner-Trauring did a neat writeup on using Cachegrind to deterministic performance analysis, inspired by my post...
Papers I love: gg
December 4, 2020
I’ve been playing with Amazon Lambda the last few weeks for a side project, and during the process I’ve gone from kinda infuriated with and baffled by Lambda...
Determinism in software engineering
November 27, 2020
A few weeks back I wrote about nondeterministic performance and the problems it poses for benchmarking. That reminded me that I’ve got a bunch of thoughts...
Benchmarking and theories of performance
November 19, 2020
I just this week finished reading Kuhn’s The Structure of Scientific Revolutions. I’d encountered many or most of the ideas in the book by reference, but...
Performance engineering requires stable benchmarks
November 12, 2020
Reproducible benchmarks are essential to doing performance engineering. In order to know if a change had an impact on performance, you have to be able to...
Test size and scope
November 5, 2020
It’s been A Week I have a 60%-written post on software performance I was hoping to send this week, but in retrospect it was pretty foolish to expect I would...
Three approaches to edge cases in data models
October 29, 2020
Edge case poisoning This post is somewhat a response to Hillel Wayne’s recent post on edge case poisoning. It should be understandable without reading his...
Once more, with feeling…
October 22, 2020
Sorry for the second email. It seems that I failed to actually disable link tracking on the previous post. I’m pretty sure that this time I’ve got it fixed...
What's worth optimizing?
October 22, 2020
Broken links in last week’s email Many people reported that all the links in last week’s post were dead. Thanks for letting me know, and I’m sorry about...
Welcome to my newsletter! And: Performance as hardware utilization
October 15, 2020
Hello! Welcome to the newsletter. I started this year with a goal of writing weekly posts on my blog, but ever since COVID-19 lockdowns started, I've been...