I’ve been running a semi-regular podcast for about a year as a side project alongside client work. Recording conditions are inconsistent because I record from wherever I happen to be, which could be a hotel room in Lisbon or an Airbnb in Chiang Mai. Not ideal acoustics.
I spent the last month testing AI audio enhancement tools on a batch of old recordings to see whether I could retroactively improve quality. Here’s what I found.
Background noise removal: genuinely excellent. The kind of ambient room tone that used to require either a treated room or significant post-production effort is handled automatically and cleanly. This alone would have saved me hours of work across the year.
Voice clarity improvement: good for consistent issues, worse for variable ones. If the recording has a consistent quality problem, low gain, slight distortion, the tools correct well. If the quality varies within a recording, which happens a lot when I’m moving or the environment changes mid-session, the tool sometimes overcorrects in ways that create audible artifacts.
The overcorrection problem: this is the main thing. When the tool is working hard on a particularly difficult section, it occasionally produces audio that sounds slightly processed in a way that most listeners would notice if they were paying attention. It’s not bad, but it’s not natural either. Paradoxically, the worst source material sometimes produces the most ‘enhanced’ sounding output rather than the cleanest.
My current use: background noise and basic clarity as a standard step. I skip the aggressive enhancement on anything that sounds close to acceptable already. Net result is better-sounding output for about 40% of my episodes with no meaningful time investment.