TTS watermarks

Preventing deep fakes

Based on Erogol's idea from Coqui there should be a way to identify deep-fakes in voice context. After some Twitter chatting^[1] there seems one thing without doubt: "It's the old story between hacker and the people trying to prevent misusage".

Possible techniques^[2]

What kind of techniques are useful for what and what's the pros and cons:

Watermark in TTS output

Easy to analyse / reproduce using original sourcecode.

Watermark in TTS dataset

Models can learn to reproduce watermark without seeing anything on that in the code.

References

[1] ttps://twitter.com/erogol/status/1464412634783043585?s=20

[2] ttps://github.com/coqui-ai/TTS/discussions/1036#discussioncomment-1863431

[1]

[2]

Preventing deep fakes

Possible techniques[2]

Watermark in TTS output

Watermark in TTS dataset

References

Possible techniques^[2]