A colleague forwarded me a demonstration last week of a voice cloning tool that produces a convincing replica from roughly 30 seconds of source audio. I’ve been sitting with this for several days and I find I still don’t have a settled view on it.
The legitimate applications are not nothing. Accessibility tools. Post-production audio correction. Language dubbing for content creators who want consistent voice across markets. Archival uses for people who have lost the ability to speak. These are real and the case for them is straightforward enough.
What I keep returning to, though, is the consent architecture. Or rather, the absence of one. The demonstrations I’ve seen don’t require anything from the person whose voice is being cloned beyond the source recording, which can be sourced from publicly available material without anyone’s knowledge. That feels like a significant gap that the ‘legitimate applications’ framing doesn’t fully address.
In publishing, we think carefully about likeness rights, about the difference between quoting someone and putting words in their mouth. Voice cloning collapses that distinction entirely. You can now put any words in anyone’s mouth with a plausible facsimile of their actual voice.
I’m not opposed to the technology categorically. I’m opposed to deploying it without a consent framework that actually means something. The current situation seems to be that the technology exists, the use cases are multiplying, and the governance is years behind.
Worth saying that this isn’t a problem the tools themselves can solve. It requires the kind of institutional framework that takes longer to build than anyone in the space seems willing to wait for.
What’s your read on where the consent obligation actually sits here?
In my experience, the companies building these tools are going to keep moving until regulation actually bites. Voluntary consent frameworks work until there’s a competitive advantage in ignoring them, at which point everyone ignores them. This isn’t theoretical. We’ve seen this pattern in data collection, in facial recognition, in social media targeting. Voice is next in the same sequence.
What most teams miss here when they frame this as a ‘consent’ problem is that it’s actually an enforcement problem. Consent without enforceability is just a disclaimer.
The consent question is the right one but it’s genuinely complicated by jurisdiction. In most places, voice isn’t a protected biometric in the same way a fingerprint is. The law hasn’t caught up with what the technology can do, which means consent is currently more of an ethical expectation than a legal requirement in most contexts. That gap is doing a lot of damage.
honestly the 30-second threshold is what gets me. that’s not a lot of material. a podcast guest, a conference talk, a youtube video. most public-facing people have more than 30 seconds of audio out there without ever thinking about what that means. the model for ‘public figure’ and ‘consented to voice replication’ are very different things and we’re treating them like they’re the same
As someone who works in fiction and voice-driven narrative, this keeps me up at night in a different way. The creative applications are genuinely exciting. I can imagine legitimate uses in audiobook production, in interactive fiction, in game writing. But those applications exist on the same technical foundation as the harmful ones. There’s no version of ‘here’s the good voice cloning’ that doesn’t also enable the other kind. That’s the part I don’t think the conversation is being honest about.
From a marketing angle, the brand risk of getting this wrong is severe. We’ve watched other companies get torched for much smaller violations of audience trust. Using voice cloning without explicit consent in any customer-facing context would be a reputational catastrophe. The tech being available doesn’t mean using it is smart.