Jump to content

Text cleaning

From Open Voice Technology Wiki
Revision as of 17:33, 6 December 2021 by Alex42 (talk | contribs) (sanitizing)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Text cleaning is the process of removing some characters from text corpus before recording them. Here are some examples of text to be cleaned. When you are using a TTS model you have to clean the text before giving it to the synthesizer if text cleaning is not included in the TTS process itself.

Numbers

Numbers should be replaced with the written form.

You have 3 timers set. ==> You have three timers set.

Time and date

Today is monday, november 3rd. ==> Today is monday, the third.
It is 2021. ==> It is twentytwentyone.

Abbreviations

Let's go to Dr. John Doe. ==> Let's go to doctor John Doe.
Weight is 5kg. ==> Weight is five kilogram.

Cookies help us deliver our services. By using our services, you agree to our use of cookies.