The agonies of transcription
If you asked 100 journalists what their least favorite work task is, I’m positive more than 90 would say transcribing. It’s so tedious, it takes so long1, and it involves at least some listening to your own recorded voice, a skin-crawling experience for anyone who hasn’t put themselves through the intense exposure therapy of making a podcast. I’ve tried so many ways of transcribing over the years, from feeling obligated to manually transcribe every word of reporting before I wrote a single word of a draft (amateur move!), to taking live notes and time-stamping the good quotes to fill in later (my longest running strategy), to winging it with frantic hand-written notes (after rare but significant recorder malfunctions, such as it falling in a river). But I never found a regular method I didn’t actively avoid doing, which can set off a procrastination pile-up.
I’ve tried some of the new AI transcription services, namely Otter and Trint. Both were ok, but they are expensive and I didn’t love them. They have limits on how many hours you can transcribe per month, which was annoying when I had a lot of interviews and felt like a waste of money when I didn’t. I found Otter’s editing interface cumbersome, and while the transcripts themselves were reasonably accurate, they were full of “ummms” and “uhhhs” and repeated filler words that needed to be cleaned up by hand to make it readable. (I haven’t used it for a while, so this may have improved.) Trint gave me the great gift of Spanish transcripts, because if there’s anything more painful than transcribing in my native language it’s transcribing in my second language. But it wasn’t a tool I needed often enough to get over the subscription hurdle.
Most of the time, I preferred not having full transcripts anyway. For news stories, I often know the quotes I want to use when I hear them, so my method of timestamping my live notes worked reasonably well. I tried pulling just those best quotes of the AI transcripts, but it turns out I needed the step of transcribing the best or most informative pieces of an interview myself to get my brain going. I needed some kind of active engagement.
I thought would stick with my timestamped notes technique forever (and I probably will continue to use it for field reporting, where recording and transcribing everything isn’t usually helpful or even possible). But with my recent news stories, I used a new-to-me transcription software that completely changed my approach. Finally, I found an AI tool that gave me just enough active engagement while still saving a lot of time. Finally, transcribing was easy and fast enough to do right after an interview, rather than putting it off for hours or days. Finally, full transcripts seemed useful instead of burdensome.
The program is called MacWhisper.2 It builds off OpenAI’s open source transcription program, called Whisper, and turns it into a simple, downloadable program for Macs. (Apologies to Windows users; hopefully someone will make an equivalent for you soon!) You don’t pay OpenAI or need to give them any data, now or in the future. The program is made by an independent developer, and everything happens locally on your own computer. In my experience, this makes it much faster than Otter and Trint, which work in the cloud, and better for privacy, since potentially sensitive interviews don’t leave your device and become more easily hackable. It supports 100 languages, including the ones I need. There’s a good free version of MacWhisper, or an even better Pro version for a one-time purchase—not a subscription—of €29.3
I changed my process significantly to get the most out of MacWhisper—and yes, I am thinking about how AI and automation and so many other new technologies only become useful once we’ve reshaped the entire world around what they’re good at. But I never liked my previous process anyway, so good riddance. Here’s what I’ve started doing instead: I record as usual, on Zoom and/or my handheld recorder, and I upload that file to MacWhisper after the interview. (You can also record directly in the program, but I haven’t tried that yet.) I use the Pro version’s Medium model, which is a bit slower than the default Base model but results in a significantly more accurate product. Then I listen back at 1.5x-2x speed and read along. MacWhisper formats transcripts in short phrases, which makes it read like subtitles for the interview and is so much easier than parsing giant blocks of text. I correct the text as needed, which isn’t much, and I assign the text to the speaker as I listen through (this is the active engagement part). I give myself the gift of deleting chunks of text I know I won’t need, like me explaining the publication timeline for the article as we’re wrapping up. Finally, I export the cleaned up transcript as a .txt file split into speaker paragraphs, and then I skim that and highlight the best stuff.
It’s possible to follow exactly this process with the other AI transcription programs, and maybe I would have liked them better if I had been doing it this way all along. But what sets MacWhisper apart for me is that it’s so much better at automatically removing linguistic filler—the “umms” and “uhhhs” and even the false-starts of phrases or sentences our brains smooth out during conversation but can be huge stumbling blocks to when written out in a transcript. For me, having those things taken out from the beginning is the difference between a readable, skimmable, usable full transcript and an exhausting, incomprehensible, overwhelming one.
The two features I wish MacWhisper had are automatic speaker separation (Otter does this; MacWhisper’s developer says it’s on its way) and a way to highlight quotes while I’m cleaning up the transcript rather than having to skim it later. For now, I don’t mind these slight inconveniences as a way of preserving some active engagement with the material, and it still feels like I cracked a code on the part of my process I always hated the most. I’m not avoiding transcribing anymore, which I didn’t even know was possible.
If transcribing manually, one recorded minute takes most people (including me) about four minutes to transcribe.
This is not an ad or a referral link; no one is paying me to write this or asked me to do it. I just really like this program
Half-off for journalists! Which I didn’t know when I bought it, lol.