Text cleaning: Difference between revisions

From Voice Technology Wiki
Jump to navigation Jump to search
(Initial draft for Text cleaning.)
 
Line 8: Line 8:
=== Time and date ===
=== Time and date ===
<blockquote>Today is monday, november 3rd. ==> Today is monday, the third.
<blockquote>Today is monday, november 3rd. ==> Today is monday, the third.


It is 2021. ==> It is twentytwentyone.</blockquote>
It is 2021. ==> It is twentytwentyone.</blockquote>

Revision as of 22:36, 15 November 2021


Text cleaning is the process of removing some characters from text corpus before recording them. Here are some examples of text to be cleaned. When you are using a TTS model you have to clean the text before giving it to the synthesizer if text cleaning is not included in the TTS process itself.

Numbers

Numbers should be replaced with the written form.

You have 3 timers set. ==> You have three timers set.

Time and date

Today is monday, november 3rd. ==> Today is monday, the third.


It is 2021. ==> It is twentytwentyone.

Abbreviations

Let's go to Dr. John Doe. ==> Let's go to doctor John Doe. Weight is 5kg. ==> Weight is five kilogram.