General TTS tutorial: Difference between revisions

2,066 bytes added ,  21 February 2022
(I wrote introduction)
Line 3: Line 3:
This tutorial contains '''non-technical''' informations on TTS or Text-to-Speech. It's meant for all people who are new to the field of TTS and try to figure out which aspects to check or which possible ways to go.
This tutorial contains '''non-technical''' informations on TTS or Text-to-Speech. It's meant for all people who are new to the field of TTS and try to figure out which aspects to check or which possible ways to go.


== '''Introduction''' ==
== Introduction ==
TTS stands for text-to-speech technology. There are many different methods converting textual input to audio output. Currently, machile-learning method of text-to-speech conversion is becoming the dominant method of TTS. To use TTS you have two options: (1) use an existing application/service or (2) build a new one. There are TTS services developed for widely-used languages such as English, German, French, Russian etc. Major platforms such as Google, Windows support these languages by default. If you want to synthesize
TTS is an acronym that stands for '''T'''ext-'''T'''o-'''S'''peech technology which allows you to convert text to audio in the form of human speech. There are many different methods converting textual input to speech output. Some people use speech-synthesis to refer to TTS, they both mean the same thing. Currently, machile-learning method of text-to-speech conversion has become the dominant method of TTS.  
 
To use TTS you have two options based on the language you are targeting and based on whether the tool/service for your target language is already there or not. If you want to generate speech in a widely-used languages, there are TTS services developed for languages such as English, German, French, Russian etc. Major platforms such as Google, Windows support these languages by default. Then you can use these existing services to generate output. For example, you can try Google Translate if it speaks your language when you provide text input. This does not mean that all TTS technologies are available to you for free. It just means that it is likely that a free version also exists or can be built without too much effort using open-source technologies.
 
If you are targeting a language which is not widely-used then it is likely that you will need to build a service that supports TTS function in your language. Usually, building a service means that you use open-source technologies and train the software to teach the computer to convert TTS in your language. There are well-developed open-source TTS platforms (examples?) out there which can be used if you are ready to invest your time and energy. This tutorial aims to provide general information for people who want to develop a TTS application for their language of choice. It can be useful for both widely-used and under-represented languages.
 
To be more precise, this tutorial covers how to develop a TTS voice. Understanding TTS technology deeply requires significant amount of knowledge from computer science, linguistics, statistics disciplines. The technology in various forms and with diverse functionalities has been developed and is being used already. The purpose of this tutorial is to understand how to use the existing technology and apply them to a particular language.
 
== Stages of training a TTS model ==
There are several stages to build a TTS voice model. The order of these stages cannot be changed, unless one already has the output from a particular stage ready to proceed to the next stage.
 
# Building a text dataset
# Preparing a voice dataset using the text dataset
# Training a TTS model used
12

edits