While this blog has been dormant, the side project continues to operate. I spent some time over the US’s Labor Day weekend in 2022 integrating speaker diarization using pyannote, with the hope of it working ahead of ATP’s 500th episode. I just barely got it integrated in time, but after notifying the hosts about the update the site was referenced during the 500th episode by Casey! I also received kind e-mails from all three hosts, which made my week.
Back when I wrote up a brief design doc for catatp.fm I included speaker identification as a potential feature. At the time after examining the landscape for speaker diarization options and their lackluster accuracy, I abandoned the feature, with the hope to return to it in the future.
Finally, over a year later (fall 2022), I found an open source project that met my accuracy requirements, and I felt it appropriate to get it integrated into catatp.fm for ATP’s 500th episode.
While there are now speaker labels for each sequence of words for each episode, the fun part was I added a section on the statistics page about the word counts for each speaker, and an interactive graph allowing someone to look over the word counts for each of the detected speakers for each episode. That makes it easy to (mostly accurately) identify which episodes Tiff Arment was in, the episodes where you interviewed someone (Chris Latner, Phil Schiller, Christina Warren), the one episode where Marco never spoke (263), the one episode where John was not present (119, the Christina Warren interview), and the fact that Casey is the only person who has been heard in every episode of ATP.
The graph also makes it easier for me to see where more work is needed, like episode 202 (where people’s laughter and Tiff Arment’s voice are detected as Jonathan Mann).
Note that I am using an older model and version of pyannote than the currently released one. That is because after I upgraded I saw a massive drop in accuracy for the long audio files I process as part of this project. After seeing the accuracy reduction, I was able to revert and continue using the cached model.
I’m currently trying to finish up another side project entirely unrelated to ATP which is occupying what little free time I have. Keep an eye on this space for more info hopefully in the near future.
You can also follow me on Mastodon @[email protected], where I believe I have already tooted more than I ever tweeted. Wow, that sounds weird…