Text cleaning

From OpenVoice-Tech Wiki
Jump to navigation Jump to search

Text cleaning is the process of removing some characters from text corpus before recording them. Here are some examples of text to be cleaned. When you are using a TTS model you have to clean the text before giving it to the synthesizer if text cleaning is not included in the TTS process itself.

Numbers[edit | edit source]

Numbers should be replaced with the written form.

You have 3 timers set. ==> You have three timers set.

Time and date[edit | edit source]

Today is monday, november 3rd. ==> Today is monday, the third.
It is 2021. ==> It is twentytwentyone.

Abbreviations[edit | edit source]

Let's go to Dr. John Doe. ==> Let's go to doctor John Doe.
Weight is 5kg. ==> Weight is five kilogram.