Make your README a brag sheet
I regret to inform you that better documentation means better projects
Welcome to Data Dash
Compressing an avalanche of thoughts about data into byte-sized chunks. In your inbox every two weeks on Wednesdays.
The least appealing job in the data mines
People don’t want to write documentation. I know there are exceptions, but otherwise there wouldn’t be so many pieces begging people to document their code + data projects.
I’m not pretending I’m awesome at writing documentation. And I have found a framing that helps me write better READMEs.
Accept your flowers
You’re working hard on your data project, and the README is a way to show that off to everyone.
Show a video demo at the top so people can see your project in action. Lay out what fascinating problem this project solves in plain language so everyone can get impressed. Show people how they can use your code themselves if they want to try to be as cool as you.
This all might sound a bit over the top, and it is! Still, taking this approach can help you end up not just with better documentation, but a better project.
Working toward a working demo rules
If you’re shooting for a working demo at the top of your README, you’re on your way to a sustainable project. To get to a working demo you need to have scoped goals and a reasonable chance of accomplishing them. I don’t know about y’all, but those two elements definitely don’t “just happen” for my projects.
A working demo also makes it clear what your project does. Of course it’s clear to us when we’ve had our nose in the project, and a demo makes our work legible to newcomers.
Using plain language is a cheat code
If we can’t explain what our project is in plain language, we need to take a big step back. If we can, we can often unlock what interesting next steps or use cases look like.
Using plain language also helps people realize what you’ve accomplished even if they don’t know the technical details.
Making your code usable levels up your skills
One project I worked on had a wild sequence of bespoke installs and complicated steps if anyone else wanted to use it. Then, I realized I should just put the project in a Docker container and the instructions shrank by 10x.
I also got to learn how to Dockerize a project that used text-to-speech + speech-to-text, a trickier proposition than I ever would have guessed.
And even before I got Docker involved I documented my code much better when I assumed someone other than me would use it. Even if I’m wrong it’s much easier to pick up projects later if they’re written to be understood.
To argue against myself
Of course not every project needs to be this extensive. I’ve spent enough time fighting with Docker containers to know they’re not the move for every single project.
And thinking of the README as a brag sheet can still be helpful for smaller projects. One of my current works-in-progress only has anything in the README because I wanted to brag a bit. And the spirit of getting to a working demo helped me build the data labeling app I mentioned in this post.
Show me your README brags on Bluesky, I want to see them!
A data thing I liked
If you’re serious about documentation, check out Chapter 8 of Data Management in Large-Scale Education Research by Crystal Lewis. The whole book is worth a read whether you’re collecting education data or not.
A non-data thing I liked
A request
I got laid off and I’m on the job market! If you know of any remote/Denver-based data scientist or ML engineer roles please send them my way on Bluesky or LinkedIn.
To the folks who have already sent roles along, you rock and I appreciate you!