ESpeak
What are eSpeak and eSpeak-NG?
eSpeak is a compact open source (GNU GPL license) software speech synthesizer for English and other languages, for Linux and Windows. It uses a "formant synthesis" method. This allows many languages to be provided in a small size. The speech is clear, and can be used at high speeds, but is not as natural or smooth as larger synthesizers which are based on human speech recordings.
Originally known as speak and originally written for Acorn/RISC_OS computers starting in 1995, eSpeak is an enhancement and re-write, including a relaxation of the original memory and processing power constraints and comes with support for additional languages.[1]
In 2010 Reece H. Dunn started maintaining a version of eSpeak on GitHub that was forked in late 2015 and renamed to eSpeak NG. The new eSpeak NG project is a significant departure from the eSpeak project with the intention of cleaning up the existing codebase, adding new features and improving the supported languages.[2]
Advantages
Although it has been practically around since the dawn of time (or at least the personal computer) there are still good reasons to use eSpeak:
- It is blazing fast, audio generation is almost instant even on SBCs like the Raspberry Pi Zero
- It is very small and memory consumption is negligible on modern systems (~5MB [source?])
- It is available on basically any platform either as command line program, shared library, SAPI5 version for Windows screen-readers, etc..
- Although the voice sounds rather robotic it has a certain charm, especially if you are building little robots ;-)
- It is available in dozens of languages
- It can translate text into phoneme codes and is often used to generate new dictionary entries for other systems (e.g. STT)
Installation
Debian command line tool (espeak-ng recommended):
sudo apt-get install -y espeak-ng espeak-ng-espeak
NOTE: eSpeak is often found in other projects like OpenTTS or SEPIA Assist Server and ready to use out-of-the-box.
Performance
eSpeak-NG (en):
- Test system: Raspberry Pi4 4GB
- Sentence: "Hello this is a test"
- Run-time: 0.08 s
- Real-time-factor: 0.062