ESpeak

From Voice Technology Wiki
Jump to navigation Jump to search

What are eSpeak and eSpeak-NG?[edit | edit source]

eSpeak is a compact open source (GNU GPL license) software speech synthesizer for English and other languages, for Linux and Windows. It uses a "formant synthesis" method. This allows many languages to be provided in a small size. The speech is clear, and can be used at high speeds, but is not as natural or smooth as larger synthesizers which are based on human speech recordings.

Originally known as speak and originally written for Acorn/RISC_OS computers starting in 1995, eSpeak is an enhancement and re-write, including a relaxation of the original memory and processing power constraints and comes with support for additional languages.[1]

In 2010 Reece H. Dunn started maintaining a version of eSpeak on GitHub that was forked in late 2015 and renamed to eSpeak NG. The new eSpeak NG project is a significant departure from the eSpeak project with the intention of cleaning up the existing codebase, adding new features and improving the supported languages.[2]

Advantages[edit | edit source]

Although it has been practically around since the dawn of time (or at least the personal computer) there are still good reasons to use eSpeak:

  • It is blazing fast, audio generation is almost instant even on SBCs like the Raspberry Pi Zero
  • It is very small and memory consumption is negligible on modern systems (~5MB [source?])
  • It is available on basically any platform either as command line program, shared library, SAPI5 version for Windows screen-readers, etc..
  • Although the voice sounds rather robotic it has a certain charm, especially if you are building little robots ;-)
  • It is available in dozens of languages
  • It can translate text into phoneme codes and is often used to generate new dictionary entries for other systems (e.g. STT)

Installation[edit | edit source]

Debian command line tool (espeak-ng recommended):

sudo apt-get install -y espeak-ng espeak-ng-espeak

NOTE: eSpeak is often found in other projects like OpenTTS or SEPIA Assist Server and ready to use out-of-the-box.

Performance[edit | edit source]

eSpeak-NG (en):

  • Test system: Raspberry Pi4 4GB
  • Sentence: "Hello this is a test"
  • Run-time: 0.08 s
  • Real-time-factor: 0.062

...

[UNDER CONSTRUCTION]

KDE & gnome integration (eSpeak-NG)[edit | edit source]

This setup will let you read text with eSpeak-NG from any application on your system simply by pressing a keyboard command.

After installing eSpeak-ng on your system adding this line to a custom keyboard shortcut in KDE or Gnome.

xsel > /tmp/speak.txt | espeak-ng -f "/tmp/speak.txt" | pkill xsel 

Type man espeak-ng in your terminal to see a list of options for this speech engine.

To stop espeak-ng we can use the following command:

pkill espeak-ng

Important: as these commands are run as a scrip instead of in a full shell environment its best to put the full path to each application you call, for example: usr/bin/xsel & usr/bin/espeak-ng etc.


In gnome open system settings then go to: Keyboard » Customise Shortcuts » Custom Shortcuts » + and make two new shortcuts, one to start reading using the line above and another shortcut to stop reading with the pkill espeak-ng command.

In KDE open system settings then go to: Shortcuts » Custom Shortcuts and make to new shortcuts one to start reading using the line above and another shortcut to stop reading with the pkill espeak-ng command.

Create a file for xsel to store the text you want to read. In this example this file is in the /tmp directory and is called speak.txt (but you can use a different file name and directory if you like). What do these commands and switch’s do: xsel > my_file_name.txt (over rights the text in the file with highlighted text)

espeak-ng -f (calls the espeak-ng tts engine to start reading a file inside the "")

pkill xsel (stops the computer starting a new process for xsel every time this command is run).

pkill espeak-ng (stops espeak-ng).