Jump to content

Text cleaning

From Open Voice Technology Wiki
Revision as of 16:33, 6 December 2021 by Alex42 (talk | contribs) (sanitizing)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Text cleaning is the process of removing some characters from text corpus before recording them. Here are some examples of text to be cleaned. When you are using a TTS model you have to clean the text before giving it to the synthesizer if text cleaning is not included in the TTS process itself.

Numbers[edit | edit source]

Numbers should be replaced with the written form.

You have 3 timers set. ==> You have three timers set.

Time and date[edit | edit source]

Today is monday, november 3rd. ==> Today is monday, the third.
It is 2021. ==> It is twentytwentyone.

Abbreviations[edit | edit source]

Let's go to Dr. John Doe. ==> Let's go to doctor John Doe.
Weight is 5kg. ==> Weight is five kilogram.

Cookies help us deliver our services. By using our services, you agree to our use of cookies.