Speaking issues Coqui TTS Tacotron2 DDC model

From Voice Technology Wiki
Revision as of 02:25, 28 October 2021 by 118.208.223.213 (talk) (Restructure and add section for mispronounced words)
Jump to navigation Jump to search


  • TTS model: tts_models/en/ljspeech/tacotron2-DDC
  • Vocoder model: vocoder_models/en/ljspeech/hifigan_v2

General

Most models are trained with a dot, exclamation or question mark at the end. So always end a sentence to avoid model synthesizing weird output.

Input string formatting

Phrases ending in "ah"

"ah" at end of sentence generally produces strange results. Short names produce a 12 second clip.

Examples:

  • Nelson Mandela
  • pergola

Mitigation

If at the end of the input, adding punctuation to the end synthesizes correctly:

Example "Nelson Mandela" > "Nelson Mandela."

Acronyms

To speak acronyms as letters it needs to be formatted as:

"A. B. C. news"

Not:

"ABC news" "A.B.C. news"

Mispronounced Words

  • video