Text cleaning

Revision as of 16:33, 6 December 2021 by Alex42 (talk | contribs) (sanitizing)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Text cleaning is the process of removing some characters from text corpus before recording them. Here are some examples of text to be cleaned. When you are using a TTS model you have to clean the text before giving it to the synthesizer if text cleaning is not included in the TTS process itself.

Numbers Edit

Numbers should be replaced with the written form.

You have 3 timers set. ==> You have three timers set.

Time and date Edit

Today is monday, november 3rd. ==> Today is monday, the third.
It is 2021. ==> It is twentytwentyone.

Abbreviations Edit

Let's go to Dr. John Doe. ==> Let's go to doctor John Doe.
Weight is 5kg. ==> Weight is five kilogram.