Open Voice Technology Wiki - User contributions [en]

SEPIA

2022-05-13T08:50:03Z

Florian: updated SEPIA version info

{{Infobox Voice Assistant
| Url = https://github.com/SEPIA-Framework/
| License = MIT License
| InternetRequired = No, but services like weather and Wiki won't work
| SupportedLanguages = en, de + partial support for 15 more (beta)
| WakeWord = Hey SEPIA, customizable
| DefaultTTS = eSpeak, picoTTS, MBROLA, MaryTTS + compatible APIs
| SupportedOS = Windows, Linux, Mac OS X, Raspberry Pi
| CodeLanguage = Java (Assist-Server), HTML/Javascript (Client), Python (STT-Server)
| LatestVersion = 2.6.2
| LatestReleaseDate = 2022-05-08
}}

== Introduction: S.E.P.I.A. Open Assistant Framework ==
[https://sepia-framework.github.io/ S.E.P.I.A.] is an acronym for: '''s'''elf-hosted, '''e'''xtendable, '''p'''ersonal, '''i'''ntelligent '''a'''ssistant. It is a modular, open-source framework equipped with all the required tools to build your own, full-fledged digital voice-assistant, including [[:Category:STT|speech recognition]] ([[:Category:STT|STT]]), [[:Category:Wake words|wake-word]] detection, [[:Category:TTS|text-to-speech]] ([[:Category:TTS|TTS]]), [[Natural language understanding|natural-language-understanding (NLU)]], dialog-management, SDK(s), a cross-platform client app and much more.

The framework consists of several, highly customizable micro-services that work together seamlessly to form the SEPIA Open Assistant. It follows the '''client-server principle using a lightweight Java server and Elasticsearch DB as "brain"''' and a JavaScript based client that works for example as '''smart-speaker, smart-display or full-fledged digital assistant app'''.

All components work on Linux, Windows and Mac and have been optimized to even '''run smoothly on a Raspberry Pi'''.

Out-of-the-box SEPIA currently has smart-services for: news, music (radio), timers, alarms, reminders, to-do and shopping lists, smart home (e.g. using open-source tools like openHAB), navigation, places, weather, Wikipedia, web-search, soccer-results (Bundesliga), a bit of small-talk and more. To build custom services there is a Java SDK and a code editor integrated into the SEPIA Control HUB web-app. The client can be extended with custom HTML widgets.<ref>https://github.com/SEPIA-Framework/sepia-docs</ref>

==Components==
Some of the core SEPIA framework components:

*[https://github.com/SEPIA-Framework/sepia-assist-server SEPIA Assist-Server] - The "brain" of SEPIA responsible for: user-accounts, database integration, [[Natural language understanding|NLU]], dialog-management, open-source [[:Category:TTS|TTS]], smart-services (weather, navigation, alarms, news etc.), remote actions and more<ref>https://github.com/SEPIA-Framework/sepia-assist-server</ref>.
*[https://github.com/SEPIA-Framework/sepia-websocket-server-java SEPIA Chat-Server] - A [[wikipedia:WebSocket|WebSocket]] based Chat-Server that takes care of real-time, asynchronous user-to-user and user-to-assistant communication<ref>https://github.com/SEPIA-Framework/sepia-websocket-server-java</ref>.
*[https://github.com/SEPIA-Framework/sepia-html-client-app SEPIA Cross-Platform Client] - The primary SEPIA client based on HTML5 web technology that runs on all modern browsers (mobile and desktop) and communicates with other SEPIA components via HTTP or WebSocket connections<ref>https://github.com/SEPIA-Framework/sepia-html-client-app</ref>. It supports headless- and kiosk-mode via [https://github.com/bytemind-de/nodejs-client-extension-interface CLEXI server] to build independent smart-speaker or smart-display like devices<ref>https://github.com/SEPIA-Framework/sepia-installation-and-setup/tree/master/sepia-client-installation</ref>. The client is available as Android app as well via [https://play.google.com/store/apps/details?id=de.bytemind.sepia.app.web Play Store] or [https://github.com/SEPIA-Framework/sepia-installation-and-setup/releases direct download].
*[[SEPIA Speech-To-Text Server|SEPIA STT-Server]] - A WebSocket based, full-duplex Python server for real-time [[:Category:STT|automatic speech recognition]] (ASR) supporting [https://github.com/SEPIA-Framework/sepia-stt-server multiple open-source ASR engines]. It can receive a stream of audio chunks via the secure WebSocket connection and return transcribed text almost immediately as partial and final results<ref>https://github.com/SEPIA-Framework/sepia-stt-server</ref>.
There are many more SEPIA components and tools to discover on the [https://github.com/SEPIA-Framework official GitHub project page].

==Installation==
The [https://github.com/SEPIA-Framework/sepia-docs official documentation page] has the most recent installation guides.

[[Category:Open Voice Assistants]]
[[Category:Project]]

<references />

SEPIA

2022-01-30T22:59:47Z

Florian: Updated SEPIA release version

{{Infobox Voice Assistant
| Url = https://github.com/SEPIA-Framework/
| License = MIT License
| InternetRequired = No, but services like weather and Wiki won't work
| SupportedLanguages = en, de + partial support for 15 more (beta)
| WakeWord = Hey SEPIA, customizable
| DefaultTTS = eSpeak, picoTTS, MBROLA, MaryTTS + compatible APIs
| SupportedOS = Windows, Linux, Mac OS X, Raspberry Pi
| CodeLanguage = Java (Assist-Server), HTML/Javascript (Client), Python (STT-Server)
| LatestVersion = 2.6.1
| LatestReleaseDate = 2022-01-30
}}

== Introduction: S.E.P.I.A. Open Assistant Framework ==
[https://sepia-framework.github.io/ S.E.P.I.A.] is an acronym for: '''s'''elf-hosted, '''e'''xtendable, '''p'''ersonal, '''i'''ntelligent '''a'''ssistant. It is a modular, open-source framework equipped with all the required tools to build your own, full-fledged digital voice-assistant, including [[:Category:STT|speech recognition]] ([[:Category:STT|STT]]), [[:Category:Wake words|wake-word]] detection, [[:Category:TTS|text-to-speech]] ([[:Category:TTS|TTS]]), [[Natural language understanding|natural-language-understanding (NLU)]], dialog-management, SDK(s), a cross-platform client app and much more.

The framework consists of several, highly customizable micro-services that work together seamlessly to form the SEPIA Open Assistant. It follows the '''client-server principle using a lightweight Java server and Elasticsearch DB as "brain"''' and a JavaScript based client that works for example as '''smart-speaker, smart-display or full-fledged digital assistant app'''.

All components work on Linux, Windows and Mac and have been optimized to even '''run smoothly on a Raspberry Pi'''.

Out-of-the-box SEPIA currently has smart-services for: news, music (radio), timers, alarms, reminders, to-do and shopping lists, smart home (e.g. using open-source tools like openHAB), navigation, places, weather, Wikipedia, web-search, soccer-results (Bundesliga), a bit of small-talk and more. To build custom services there is a Java SDK and a code editor integrated into the SEPIA Control HUB web-app. The client can be extended with custom HTML widgets.<ref>https://github.com/SEPIA-Framework/sepia-docs</ref>

==Components==
Some of the core SEPIA framework components:

*[https://github.com/SEPIA-Framework/sepia-assist-server SEPIA Assist-Server] - The "brain" of SEPIA responsible for: user-accounts, database integration, [[Natural language understanding|NLU]], dialog-management, open-source [[:Category:TTS|TTS]], smart-services (weather, navigation, alarms, news etc.), remote actions and more<ref>https://github.com/SEPIA-Framework/sepia-assist-server</ref>.
*[https://github.com/SEPIA-Framework/sepia-websocket-server-java SEPIA Chat-Server] - A [[wikipedia:WebSocket|WebSocket]] based Chat-Server that takes care of real-time, asynchronous user-to-user and user-to-assistant communication<ref>https://github.com/SEPIA-Framework/sepia-websocket-server-java</ref>.
*[https://github.com/SEPIA-Framework/sepia-html-client-app SEPIA Cross-Platform Client] - The primary SEPIA client based on HTML5 web technology that runs on all modern browsers (mobile and desktop) and communicates with other SEPIA components via HTTP or WebSocket connections<ref>https://github.com/SEPIA-Framework/sepia-html-client-app</ref>. It supports headless- and kiosk-mode via [https://github.com/bytemind-de/nodejs-client-extension-interface CLEXI server] to build independent smart-speaker or smart-display like devices<ref>https://github.com/SEPIA-Framework/sepia-installation-and-setup/tree/master/sepia-client-installation</ref>. The client is available as Android app as well via [https://play.google.com/store/apps/details?id=de.bytemind.sepia.app.web Play Store] or [https://github.com/SEPIA-Framework/sepia-installation-and-setup/releases direct download].
*[[SEPIA Speech-To-Text Server|SEPIA STT-Server]] - A WebSocket based, full-duplex Python server for real-time [[:Category:STT|automatic speech recognition]] (ASR) supporting [https://github.com/SEPIA-Framework/sepia-stt-server multiple open-source ASR engines]. It can receive a stream of audio chunks via the secure WebSocket connection and return transcribed text almost immediately as partial and final results<ref>https://github.com/SEPIA-Framework/sepia-stt-server</ref>.
There are many more SEPIA components and tools to discover on the [https://github.com/SEPIA-Framework official GitHub project page].

==Installation==
The [https://github.com/SEPIA-Framework/sepia-docs official documentation page] has the most recent installation guides.

[[Category:Open Voice Assistants]]
[[Category:Project]]

<references />

Glossary of voice tech

2022-01-28T11:50:59Z

Florian: added category "Glossary"

[[Category:Open Voice Tech]]

In the field of voice technology there are lots of buzzwords. Some are self explaining, other lead to confusion regularly. This list should be a glossary.

==General terms==

*[[:Category:Dataset|Dataset]]
*[[Research papers|Papers]] (''research papers'')
*[[Phonemes]]
*[[Model]]
*[[Checkpoint]]
*[[Repository]]

==STT terms==

*[[:Category:Wake words|Wake word]]
*[[Hotword]]
*[[Voice print]]
*[[Word error rate]] (''WER'')
*[[Diarization]]
*[[Barge-in]]

==TTS terms==

*

==Voice assistant terms==

*[[Conversational AI]]
*[[Natural language understanding]] (''NLU'')
*[[Intent]]
*[[Utterance]]
*[[Voiceonly]]

==Machine learning==

*[[Epoch]]
*[[Step]]
*[[Batch size]]
*[[Learning rate]]
*[[Inference]]
*[[Alignment]]
[[Category:Glossary]]

Barge-in

2022-01-28T11:50:23Z

Florian: removed category "Open Voice Assistants" since this is a vocabulary explanation. Created new category "Glossary" for it.

''"Barge-in is a feature that allows callers to interrupt a prompt and provide their response before the prompt has finished playing"'' at pag. 24 of book "Voice User Interface Design" by James Giangola et al.

In other words, barge-in is, in any voice interface system / [[:Category:Open Voice Assistants|voice assistant]], the user capability of interrupt/stop the assistant spoken (a [[:Category:TTS|text-to-speech]] synthetic voice play), to impose a new overriding user voice request to be processed asap.
[[Category:Glossary]]

Mycroft Precise

2022-01-28T11:47:44Z

Florian: Removed category "Open Voice Assistants" because Precise is a component of Mycroft

[[Category:Mycroft]]
[[Category:Wake words]]

Mycroft Precise is a simple to use [[:Category:Wake words|wake word listener]] created by [[Mycroft]].

Precise can be used to model custom wake words. They have a good [https://github.com/MycroftAI/mycroft-precise/wiki/Training-your-own-wake-word#how-to-train-your-own-wake-word documentation page] on that, with additional tips and tricks from other [[Mycroft]] community members [https://github.com/el-tocino/localcroft/blob/master/precise/Precise.md here] and [https://github.com/sparky-vision/mycroft-precise-tips here].

Porcupine

2022-01-28T11:39:58Z

Florian: Minor text format updates

Porcupine by Picovoice.ai <ref>https://picovoice.ai/platform/porcupine/</ref> is a commercial wake-word engine with some open components licensed under Apache 2.0. The open-source parts change from version to version but usually include a SDK to integrate the engine and a number of wake-words that can be used freely like "Computer", "Jarvis", "Alexa", "Hey Siri" etc..

When it was first released to the public in March 2018 it quickly became the go-to solution for open-source projects due to its high accuracy, low resource requirements and good platform support though the selection of free wake-words was pretty limited at that time (Picovoice, Alexa, Raspberry and some more "exotic" words <ref>https://github.com/Picovoice/porcupine/tree/0507c5250b50d0a938b714ba604819d61fc9e602/resources/keyword_files</ref>).

The most unique aspect of Porcupine is the ability to create new wake-words from just one line of text, no audio recording, no training data! <ref>https://medium.com/@alirezakenarsarianhari/yet-another-wake-word-detection-engine-a2486d36d8d4</ref>. Unfortunately this feature was only available (with some very limited exceptions) to commercial customers until Porcupine v2.0. Since v2.0 everyone can create their own wake-words via the Picovoice online console within the limited free-tier <ref>https://picovoice.ai/console/</ref><ref>https://picovoice.ai/blog/introducing-picovoices-free-tier/</ref>. The drawback of version 2.0 is that the usage requires an API key and devices need to be activated by contacting the Picovoice server every now and then.

== Features Summary ==

* Commercial product with a free tier and some open parts. Access key and online activation since v2.0 (December 2021).
* At the time of writing (Jan. 2022) probably the best wake-word engine for open-source projects in terms of accuracy and platform support (empiric data, needs some real statistics).
* Available on basically any platform and in any programming language from x86 to ARM, from PCs to microcontrollers, from Android to Raspberry Pi, from Python to Javascript (WASM support).
*Popular wake-words available in free version (e.g. "Computer", "Jarvis", "Alexa" etc.). To avoid registration, access key management and online activation use Porcupine v1.9.
* Everyone can create custom wake-words with one line of text via web console (v2.0).
* Low resource consumption (works on Raspberry Pi Zero).

== Links ==

Porcupine

2022-01-28T11:36:59Z

Florian: updated some info text

Porcupine by Picovoice.ai <ref>https://picovoice.ai/platform/porcupine/</ref> is a commercial wake-word engine with some open components licensed under Apache 2.0. The open-source parts change from version to version but usually include a SDK to integrate the engine and a number of wake-words that can be used freely like "Computer", "Jarvis", "Alexa", "Hey Siri" etc..

When it was first released to the public in March 2018 it quickly became the go-to solution for open-source projects due to its high accuracy, low resource requirements and good platform support though the selection of free wake-words was pretty limited at that time (Picovoice, Alexa, Raspberry and some more "exotic" words <ref>https://github.com/Picovoice/porcupine/tree/0507c5250b50d0a938b714ba604819d61fc9e602/resources/keyword_files</ref>).

The most unique aspect of Porcupine is the ability to create new wake-words from just one line of text, no audio recording, no training data! <ref>https://medium.com/@alirezakenarsarianhari/yet-another-wake-word-detection-engine-a2486d36d8d4</ref>. Unfortunately this feature was only available (with some very limited exceptions) to commercial customers until Porcupine v2.0. Since v2.0 everyone can create their own wake-words via the Picovoice online console within the limited free-tier <ref>https://picovoice.ai/console/</ref><ref>https://picovoice.ai/blog/introducing-picovoices-free-tier/</ref>. The drawback of version 2.0 is that usage requires an API key and devices need to be activated by contacting the Picovoice server every now and then.

== Features Summary ==

* Commercial product with a free tier and some open parts. Access key and online activation since v2.0 (December 2021).
* At the time of writing (Jan. 2022) probably the best wake-word engine for open-source projects in terms of accuracy and platform support (empiric data, needs some real statistics).
* Available on basically any platform and in any programming language from x86 to ARM, from PCs to microcontrollers, from Android to Raspberry Pi, from Python to Javascript (WASM support).
*Popular wake-words available in free version (e.g. "Computer", "Jarvis", "Alexa" etc.). To avoid registration access key management and online activation use Porcupine v1.9.
* Everyone can create custom wake-words with one line of text via web console (v2.0).
* Low resource consumption (works on Raspberry Pi Zero).

== Links ==

Porcupine

2022-01-28T11:33:37Z

Florian: Created Porcupine page

Porcupine by Picovoice.ai <ref>https://picovoice.ai/platform/porcupine/</ref> is a commercial wake-word engine with some open components licensed under Apache 2.0. The open-source parts change from version to version but usually include a SDK to integrate the engine and a number of wake-words that can be used freely like "Computer", "Jarvis", "Alexa", "Hey Siri" etc..

When it was first released to the public in March 2018 it quickly became the go-to solution for open-source projects due to its high accuracy, low resource requirements and good platform support though the selection of free wake-words was pretty limited at that time (Picovoice, Alexa, Raspberry and some more "exotic" words <ref>https://github.com/Picovoice/porcupine/tree/0507c5250b50d0a938b714ba604819d61fc9e602/resources/keyword_files</ref>).

The most unique aspect of Porcupine is the ability to create new wake-words from just one line of text, no audio recording, no training data! <ref>https://medium.com/@alirezakenarsarianhari/yet-another-wake-word-detection-engine-a2486d36d8d4</ref>. Unfortunately this feature was only available (with some very limited exceptions) to commercial customers until Porcupine v2.0. Since v2.0 everyone can create their own wake-words via the Picovoice online console within the limited free-tier <ref>https://picovoice.ai/console/</ref><ref>https://picovoice.ai/blog/introducing-picovoices-free-tier/</ref>. The drawback of version 2.0 is that usage requires an API key and devices need to be activated by contacting the Picovoice server every now and then.

== Features Summary ==

* Commercial product with a free tier and some open parts. Access key and online activation since v2.0 (December 2021).
* At the time of writing (Jan. 2022) probably the best wake-word engine for open-source projects in terms of accuracy and platform support (empiric data, needs some real statistics).
* Available on basically any platform and in any programming language from x86 to ARM, from PCs to microcontrollers, from Android to Raspberry Pi, from Python to Javascript (WASM support).
* Custom wake-words can be created with one line of text
* Low resource consumption (works on Raspberry Pi Zero)

SEPIA

2022-01-28T10:40:53Z

Florian: Updated infobox

{{Infobox Voice Assistant
| Url = https://github.com/SEPIA-Framework/
| License = MIT License
| InternetRequired = No, but services like weather and Wiki won't work
| SupportedLanguages = en, de + partial support for 15 more (beta)
| WakeWord = Hey SEPIA, customizable
| DefaultTTS = eSpeak, picoTTS, MBROLA, MaryTTS + compatible APIs
| SupportedOS = Windows, Linux, Mac OS X, Raspberry Pi
| CodeLanguage = Java (Assist-Server), HTML/Javascript (Client), Python (STT-Server)
| LatestVersion = 2.6.0
| LatestReleaseDate = 2021-10-14
}}

== Introduction: S.E.P.I.A. Open Assistant Framework ==
[https://sepia-framework.github.io/ S.E.P.I.A.] is an acronym for: '''s'''elf-hosted, '''e'''xtendable, '''p'''ersonal, '''i'''ntelligent '''a'''ssistant. It is a modular, open-source framework equipped with all the required tools to build your own, full-fledged digital voice-assistant, including [[:Category:STT|speech recognition]] ([[:Category:STT|STT]]), [[:Category:Wake words|wake-word]] detection, [[:Category:TTS|text-to-speech]] ([[:Category:TTS|TTS]]), [[Natural language understanding|natural-language-understanding (NLU)]], dialog-management, SDK(s), a cross-platform client app and much more.

The framework consists of several, highly customizable micro-services that work together seamlessly to form the SEPIA Open Assistant. It follows the '''client-server principle using a lightweight Java server and Elasticsearch DB as "brain"''' and a JavaScript based client that works for example as '''smart-speaker, smart-display or full-fledged digital assistant app'''.

All components work on Linux, Windows and Mac and have been optimized to even '''run smoothly on a Raspberry Pi'''.

Out-of-the-box SEPIA currently has smart-services for: news, music (radio), timers, alarms, reminders, to-do and shopping lists, smart home (e.g. using open-source tools like openHAB), navigation, places, weather, Wikipedia, web-search, soccer-results (Bundesliga), a bit of small-talk and more. To build custom services there is a Java SDK and a code editor integrated into the SEPIA Control HUB web-app. The client can be extended with custom HTML widgets.<ref>https://github.com/SEPIA-Framework/sepia-docs</ref>

==Components==
Some of the core SEPIA framework components:

*[https://github.com/SEPIA-Framework/sepia-assist-server SEPIA Assist-Server] - The "brain" of SEPIA responsible for: user-accounts, database integration, [[Natural language understanding|NLU]], dialog-management, open-source [[:Category:TTS|TTS]], smart-services (weather, navigation, alarms, news etc.), remote actions and more<ref>https://github.com/SEPIA-Framework/sepia-assist-server</ref>.
*[https://github.com/SEPIA-Framework/sepia-websocket-server-java SEPIA Chat-Server] - A [[wikipedia:WebSocket|WebSocket]] based Chat-Server that takes care of real-time, asynchronous user-to-user and user-to-assistant communication<ref>https://github.com/SEPIA-Framework/sepia-websocket-server-java</ref>.
*[https://github.com/SEPIA-Framework/sepia-html-client-app SEPIA Cross-Platform Client] - The primary SEPIA client based on HTML5 web technology that runs on all modern browsers (mobile and desktop) and communicates with other SEPIA components via HTTP or WebSocket connections<ref>https://github.com/SEPIA-Framework/sepia-html-client-app</ref>. It supports headless- and kiosk-mode via [https://github.com/bytemind-de/nodejs-client-extension-interface CLEXI server] to build independent smart-speaker or smart-display like devices<ref>https://github.com/SEPIA-Framework/sepia-installation-and-setup/tree/master/sepia-client-installation</ref>. The client is available as Android app as well via [https://play.google.com/store/apps/details?id=de.bytemind.sepia.app.web Play Store] or [https://github.com/SEPIA-Framework/sepia-installation-and-setup/releases direct download].
*[[SEPIA Speech-To-Text Server|SEPIA STT-Server]] - A WebSocket based, full-duplex Python server for real-time [[:Category:STT|automatic speech recognition]] (ASR) supporting [https://github.com/SEPIA-Framework/sepia-stt-server multiple open-source ASR engines]. It can receive a stream of audio chunks via the secure WebSocket connection and return transcribed text almost immediately as partial and final results<ref>https://github.com/SEPIA-Framework/sepia-stt-server</ref>.
There are many more SEPIA components and tools to discover on the [https://github.com/SEPIA-Framework official GitHub project page].

==Installation==
The [https://github.com/SEPIA-Framework/sepia-docs official documentation page] has the most recent installation guides.

[[Category:Open Voice Assistants]]
[[Category:Project]]

<references />

Comparison of voice assistants

2021-12-18T14:11:45Z

Florian: Added SEPIA info and removed license field duplicate

There are lots of (open) voice assistants out there. Maybe we can make a comparison list of which assistants exists, in which points they're equal and which aspects differ.
{| class="wikitable"
|+Comparison of available voice assistants
!
![[Mycroft]] AI
![[SEPIA]]
![[Rhasspy]]
![[Leon]]
![[Genie]]
|-
|Target group
|
|Makers, tinkerers, smart-home enthusiasts,
end-users via mobile app
|
|
|
|-
|License
|
|MIT
|
|
|
|-
|Requires internet access
|
|100% offline is possible, but services
like Wikipedia or news require internet
|
|
|
|-
|Offline STT
|
|via SEPIA STT
|
|
|
|-
|Offline TTS
|
|via SEPIA-Home
|
|
|
|-
|URL
|
|https://sepia-framework.github.io/
|
|
|
|-
|Comment
|
|Highly customizable. Modules can be distributed to
multiple Raspberry Pis in home network.
|
|
|
|-
|Linux
|
|Yes
|
|
|
|-
|Windows
|
|Yes
|
|
|
|-
|Mac OS X
|
|Yes
|
|
|
|-
|Raspberry Pi
|
|Yes
|
|
|
|-
|Own hw product line
|
|No (maybe someday ^^)
|
|
|
|-
|...
|
|
|
|
|
|}

[[Category:Open Voice Assistants]]

SEPIA Speech-To-Text Server

2021-12-03T10:33:34Z

Florian: optimized format

SEPIA Speech-To-Text (STT) Server is a [[wikipedia:WebSocket|WebSocket]] based, full-duplex Python server for real-time automatic speech recognition (ASR) supporting multiple open-source ASR engines. It can receive a stream of audio chunks via the secure WebSocket connection and return transcribed text almost immediately as partial and final results.

One goal of this project is to offer a standardized, secure, real-time interface for all the great open-source ASR tools out there. The server works on all major platforms including single-board devices like Raspberry Pi (4).<ref>https://github.com/SEPIA-Framework/sepia-stt-server</ref>

== Features ==

* WebSocket server ([https://fastapi.tiangolo.com/ Python Fast-API]) that can receive audio streams and send transcribed text at the same time
* Modular architecture to support multiple ASR engines like [[Vosk]] (reference implementation), [[Coqui]], [[Deepspeech]], [[Scribosermo]] and more (under construction)
* Optional post processing of result (e.g. via [https://github.com/allo-media/text2num text2num] and custom modules)
* Standardized API for all engines and support for individual engine features ([[speaker identification]], grammar, confidence score, word timestamps, alternative results, etc.)
* On-the-fly server and engine configuration via HTTP REST API and WebSocket 'welcome' event (including custom grammar, if supported by engine and model)
* User authentication via simple common token or individual tokens for multiple users
* Docker containers with support for all major platform architectures: x86 64Bit (amd64), ARM 32Bit (armv7l) and ARM 64Bit (aarch64)
* Fast enough to run even on Raspberry Pi 4 (2GB) in real-time (depending on engine and model configuration)
* Compatible to [[SEPIA]] Framework client (v0.24+)

[[Category:STT]]

SEPIA Speech-To-Text Server

2021-12-03T10:33:11Z

Florian: formatting

Vosk

2021-12-03T10:32:10Z

Florian: Created Vosk page

[https://github.com/alphacep/vosk-api Vosk] is an open-source speech recognition toolkit by Alphacephei<ref>https://alphacephei.com/vosk/</ref>. Key features are:

# Supports 20+ languages and dialects - English, Indian English, German, French, Spanish, Portuguese, Chinese, Russian, Turkish, Vietnamese, Italian, Dutch, Catalan, Arabic, Greek, Farsi, Filipino, Ukrainian, Kazakh, Swedish. More to come.
# Works offline, even on lightweight devices - Raspberry Pi, Android, iOS
# Installs with simple <code>pip3 install vosk</code>
# Portable per-language models are only 50Mb each, but there are much bigger server models available.
# Provides streaming API for the best user experience (unlike popular speech-recognition python packages)
# There are bindings for different programming languages, too - java/csharp/javascript etc.
# Allows quick reconfiguration of vocabulary for best accuracy.
# Supports speaker identification beside simple speech recognition.

[[Category:STT]]

SEPIA Speech-To-Text Server

2021-12-03T10:26:40Z

Florian: Created SEPIA STT Server page

SEPIA

2021-12-03T10:17:52Z

Florian: Added components section. More references.

== Introduction: S.E.P.I.A. Open Assistant Framework ==
[https://sepia-framework.github.io/ S.E.P.I.A.] is an acronym for: '''s'''elf-hosted, '''e'''xtendable, '''p'''ersonal, '''i'''ntelligent '''a'''ssistant. It is a modular, open-source framework equipped with all the required tools to build your own, full-fledged digital voice-assistant, including [[:Category:STT|speech recognition]] ([[:Category:STT|STT]]), [[:Category:Wake words|wake-word]] detection, [[:Category:TTS|text-to-speech]] ([[:Category:TTS|TTS]]), [[Natural language understanding|natural-language-understanding (NLU)]], dialog-management, SDK(s), a cross-platform client app and much more.

The framework consists of several, highly customizable micro-services that work together seamlessly to form the SEPIA Open Assistant. It follows the '''client-server principle using a lightweight Java server and Elasticsearch DB as "brain"''' and a JavaScript based client that works for example as '''smart-speaker, smart-display or full-fledged digital assistant app'''.

All components work on Linux, Windows and Mac and have been optimized to even '''run smoothly on a Raspberry Pi'''.

Out-of-the-box SEPIA currently has smart-services for: news, music (radio), timers, alarms, reminders, to-do and shopping lists, smart home (e.g. using open-source tools like openHAB), navigation, places, weather, Wikipedia, web-search, soccer-results (Bundesliga), a bit of small-talk and more. To build custom services there is a Java SDK and a code editor integrated into the SEPIA Control HUB web-app. The client can be extended with custom HTML widgets.<ref>https://github.com/SEPIA-Framework/sepia-docs</ref>

==Components==
Some of the core SEPIA framework components:

*[https://github.com/SEPIA-Framework/sepia-assist-server SEPIA Assist-Server] - The "brain" of SEPIA responsible for: user-accounts, database integration, [[Natural language understanding|NLU]], dialog-management, open-source [[:Category:TTS|TTS]], smart-services (weather, navigation, alarms, news etc.), remote actions and more<ref>https://github.com/SEPIA-Framework/sepia-assist-server</ref>.
*[https://github.com/SEPIA-Framework/sepia-websocket-server-java SEPIA Chat-Server] - A [[wikipedia:WebSocket|WebSocket]] based Chat-Server that takes care of real-time, asynchronous user-to-user and user-to-assistant communication<ref>https://github.com/SEPIA-Framework/sepia-websocket-server-java</ref>.
*[https://github.com/SEPIA-Framework/sepia-html-client-app SEPIA Cross-Platform Client] - The primary SEPIA client based on HTML5 web technology that runs on all modern browsers (mobile and desktop) and communicates with other SEPIA components via HTTP or WebSocket connections<ref>https://github.com/SEPIA-Framework/sepia-html-client-app</ref>. It supports headless- and kiosk-mode via [https://github.com/bytemind-de/nodejs-client-extension-interface CLEXI server] to build independent smart-speaker or smart-display like devices<ref>https://github.com/SEPIA-Framework/sepia-installation-and-setup/tree/master/sepia-client-installation</ref>. The client is available as Android app as well via [https://play.google.com/store/apps/details?id=de.bytemind.sepia.app.web Play Store] or [https://github.com/SEPIA-Framework/sepia-installation-and-setup/releases direct download].
*[[SEPIA Speech-To-Text Server|SEPIA STT-Server]] - A WebSocket based, full-duplex Python server for real-time [[:Category:STT|automatic speech recognition]] (ASR) supporting [https://github.com/SEPIA-Framework/sepia-stt-server multiple open-source ASR engines]. It can receive a stream of audio chunks via the secure WebSocket connection and return transcribed text almost immediately as partial and final results<ref>https://github.com/SEPIA-Framework/sepia-stt-server</ref>.
There are many more SEPIA components and tools to discover on the [https://github.com/SEPIA-Framework official GitHub project page].

==Installation==
The [https://github.com/SEPIA-Framework/sepia-docs official documentation page] has the most recent installation guides.

[[Category:Open Voice Assistants]]
[[Category:Project]]

<references />

SEPIA

2021-12-01T18:17:44Z

Florian: Created basic SEPIA page

== Introduction: S.E.P.I.A. Open Assistant Framework ==
S.E.P.I.A. is an acronym for: '''s'''elf-hosted, '''e'''xtendable, '''p'''ersonal, '''i'''ntelligent '''a'''ssistant. It is a modular, open-source framework equipped with all the required tools to build your own, full-fledged digital voice-assistant, including speech recognition (STT), wake-word detection, text-to-speech (TTS), natural-language-understanding, dialog-management, SDK(s), a cross-platform client app and much more.

The framework consists of several, highly customizable micro-services that work together seamlessly to form the SEPIA Open Assistant. It follows the '''client-server principle using a lightweight Java server and Elasticsearch DB as "brain"''' and a Javascript based client that works for example as '''smart-speaker, smart-display or full-fledged digital assistant app'''.

All components work on Linux, Windows and Mac and have been optimized to even '''run smoothly on a Raspberry Pi'''.

Out-of-the-box SEPIA currently has smart-services for: news, music (radio), timers, alarms, reminders, to-do and shopping lists, smart home (e.g. using open-source tools like openHAB), navigation, places, weather, Wikipedia, web-search, soccer-results (Bundesliga), a bit of small-talk and more. To build custom services there is a Java SDK and a code editor integrated into the SEPIA Control HUB web-app. The client can be extended with custom HTML widgets.

== Components ==
Under construction

== Installation ==
Under construction
[[Category:Open Voice Assistants]]

MaryTTS

2021-11-13T13:34:00Z

Florian: Added HTTP REST API note

== What is MaryTTS? ==
[http://mary.dfki.de/ Mary ('''M'''odular '''A'''rchitecture for '''R'''esearch in s'''Y'''ynthesis) Text-to-Speech] is an open-source (GNU LGPL license<ref>https://github.com/marytts/marytts/blob/master/LICENSE.md</ref>), multilingual Text-to-Speech Synthesis platform '''written in Java'''. It was originally developed as a collaborative project of [http://www.dfki.de/web DFKI’s] Language Technology Lab and the [http://www.coli.uni-saarland.de/groups/WB/Phonetics/ Institute of Phonetics] at Saarland University, Germany. It is now maintained by the Multimodal Speech Processing Group in the [https://www.mmci.uni-saarland.de/ Cluster of Excellence MMCI] and DFKI.<ref>http://mary.dfki.de/</ref>

MaryTTS has been around for a very! long time. Version 3.0 even dates back to 2006, long before Deep Learning was a broadly known term and the last official release was version 5.2 in 2016<ref>http://mary.dfki.de/download/index.html</ref>. The system uses [[wikipedia:Speech_synthesis#Unit_selection_synthesis|unit selection]] and [[wikipedia:Hidden_Markov_model|HMM]]-based techniques to build voices (today probably called AI, back then called statistics ^^). If you want to learn more about the architecture check out the [http://mary.dfki.de/documentation/index.html official documentation].

There is still activity on the GitHub page and internally there has been some major code refactoring but it is currently unclear if there will ever be another release version. There has been an unofficial snapshot release for the SEPIA Framework which runs stable on Java 11 but should be considered experimental: [https://github.com/fquirin/marytts/releases MaryTTS 6.0 snapshot] ([https://hub.docker.com/r/sepia/marytts Docker container]).

== Advantages of MaryTTS ==
MaryTTS has certain advantages compared to modern Deep Learning systems or classical, synthetic engines like [[eSpeak]]:

* The quality of the voice depends strongly on the model but can be surprisingly good, not state-of-the-art but much better than a synthetic voice.
* Audio generation is very fast and ranges from 0.2 to 0.5 [[Real-time-factor|RTF]] on a Raspberry Pi 4 depending on the selected voice<ref>https://b07z.net/downloads/tts.html</ref> meaning it can actually be used on edge devices.
* RAM consumption is moderate but you should probably reserve around 256-512 MB.
* Installation is super easy and it runs on Windows, Mac and Linux (every system that can install Java 8)
* Language support is very good: German, British and American English, French, Italian, Luxembourgish, Russian, Swedish, Telugu, Turkish and more
*Pronunciation of times, dates, temperatures etc. can be very good. MaryTTS uses an extensive, handcrafted set of rules (and statistics) to handle this.<ref>http://mary.dfki.de/documentation/overview.html</ref>
*MaryTTS is server-based meaning it offers a HTTP REST API out-of-the-box to synthesize text

== Models ==
Voice models can be downloaded via script from inside the release version or using these links:

* [https://github.com/marytts/voice-bits1-hsmm/releases bits1-hsmm] - A female German hidden semi-Markov model voice
* [https://github.com/marytts/voice-bits3-hsmm/releases/ bits3-hsmm] - A male German hidden semi-Markov model voice
* [https://github.com/marytts/voice-cmu-slt-hsmm/ voice-cmu-slt-hsmm] (included) - A female US English hidden semi-Markov model voice
* [https://github.com/marytts/voice-cmu-bdl-hsmm/releases/ cmu-bdl-hsmm] - A male US English hidden semi-Markov model voice
* [https://github.com/marytts/voice-dfki-spike-hsmm/releases/ dfki-spike-hsmm] - A male British English hidden semi-Markov model voice
* [https://github.com/marytts/voice-dfki-prudence-hsmm/releases/ dfki-prudence-hsmm] - A female British English hidden semi-Markov model voice
* [https://github.com/marytts/voice-upmc-jessica-hsmm/releases/ upmc-jessica-hsmm] - A female French hidden semi-Markov model voice
* ...

[UNDER CONSTRUCTION]

== Installation ==
Installation is super easy:

* Install Java 8 or 11 (Debian: 'sudo apt-get install openjdk-11-jdk-headless')
* Download the release ZIP file: [https://github.com/marytts/marytts/releases v5.2 official] or [https://github.com/fquirin/marytts/releases v6.0 snpashot]
* Extract the ZIP and start the server (run scripts are in 'marytts\bin')

By default you can access the server in your browser via: http://localhost:59125/

In a production system you might want to run MaryTTS behind a [[wikipedia:Reverse_proxy|reverse proxy]] (like Nginx or Apache) to [https://github.com/SEPIA-Framework/sepia-assist-server/blob/master/Xtensions/TTS/marytts/INSTALL.md#solving-cors-problems avoid CORS issues].

== Performance ==
==== Voice: dfki-spike-hsmm en_GB ====

* Test system: Raspberry Pi4 4GB
* Sentence: "Hello this is a test"
* Run-time: 0.33 s
* Real-time-factor: 0.19

...

[UNDER CONSTRUCTION]

<references />
[[Category:TTS]]

OpenTTS

2021-11-13T13:31:34Z

Florian: Created OpenTTS draft

== What is OpenTTS ==
[UNDER CONSTRUCTION]

https://github.com/synesthesiam/opentts
[[Category:TTS]]

ESpeak

2021-11-13T13:30:32Z

Florian: Format tweaks

== What are eSpeak and eSpeak-NG? ==
eSpeak is a compact open source (GNU GPL license) software speech synthesizer for English and other languages, for Linux and Windows. It uses a "formant synthesis" method. This allows many languages to be provided in a small size. The speech is clear, and can be used at high speeds, but is not as natural or smooth as larger synthesizers which are based on human speech recordings.

Originally known as '''speak''' and originally written for Acorn/RISC_OS computers starting in 1995, eSpeak is an enhancement and re-write, including a relaxation of the original memory and processing power constraints and comes with support for additional languages.<ref>http://espeak.sourceforge.net/</ref>

In 2010 Reece H. Dunn started maintaining a version of eSpeak on GitHub that was forked in late 2015 and renamed to '''eSpeak NG'''. The new eSpeak NG project is a significant departure from the eSpeak project with the intention of cleaning up the existing codebase, adding new features and improving the supported languages.<ref>https://github.com/espeak-ng/espeak-ng</ref>

== Advantages ==
Although it has been practically around since the dawn of time (or at least the personal computer) there are still good reasons to use eSpeak:

* It is blazing fast, audio generation is almost instant even on SBCs like the Raspberry Pi Zero
* It is very small and memory consumption is negligible on modern systems (~5MB [source?])
* It is available on basically any platform either as command line program, shared library, SAPI5 version for Windows screen-readers, etc..
* Although the voice sounds rather robotic it has a certain charm, especially if you are building little robots ;-)
* It is available in dozens of languages
* It can translate text into phoneme codes and is often used to generate new dictionary entries for other systems (e.g. STT)

== Installation ==
Debian command line tool (espeak-ng recommended):<syntaxhighlight lang="bash">
sudo apt-get install -y espeak-ng espeak-ng-espeak
</syntaxhighlight>NOTE: eSpeak is often found in other projects like [[OpenTTS]] or [https://github.com/SEPIA-Framework/sepia-assist-server/tree/master/Xtensions/TTS SEPIA Assist Server] and ready to use out-of-the-box.

== Performance ==
eSpeak-NG (en):

* Test system: Raspberry Pi4 4GB
* Sentence: "Hello this is a test"
* Run-time: 0.08 s
* Real-time-factor: 0.062
...

[UNDER CONSTRUCTION]

[[Category:TTS]]

ESpeak

2021-11-13T13:29:44Z

Florian: Created eSpeak page

MaryTTS

2021-11-13T12:54:54Z

Florian: Smaller format fix

== What is MaryTTS? ==
[http://mary.dfki.de/ Mary ('''M'''odular '''A'''rchitecture for '''R'''esearch in s'''Y'''ynthesis) Text-to-Speech] is an open-source (GNU LGPL license<ref>https://github.com/marytts/marytts/blob/master/LICENSE.md</ref>), multilingual Text-to-Speech Synthesis platform '''written in Java'''. It was originally developed as a collaborative project of [http://www.dfki.de/web DFKI’s] Language Technology Lab and the [http://www.coli.uni-saarland.de/groups/WB/Phonetics/ Institute of Phonetics] at Saarland University, Germany. It is now maintained by the Multimodal Speech Processing Group in the [https://www.mmci.uni-saarland.de/ Cluster of Excellence MMCI] and DFKI.<ref>http://mary.dfki.de/</ref>

MaryTTS has been around for a very! long time. Version 3.0 even dates back to 2006, long before Deep Learning was a broadly known term and the last official release was version 5.2 in 2016<ref>http://mary.dfki.de/download/index.html</ref>. The system uses [[wikipedia:Speech_synthesis#Unit_selection_synthesis|unit selection]] and [[wikipedia:Hidden_Markov_model|HMM]]-based techniques to build voices (today probably called AI, back then called statistics ^^). If you want to learn more about the architecture check out the [http://mary.dfki.de/documentation/index.html official documentation].

There is still activity on the GitHub page and internally there has been some major code refactoring but it is currently unclear if there will ever be another release version. There has been an unofficial snapshot release for the SEPIA Framework which runs stable on Java 11 but should be considered experimental: [https://github.com/fquirin/marytts/releases MaryTTS 6.0 snapshot] ([https://hub.docker.com/r/sepia/marytts Docker container]).

== Advantages of MaryTTS ==
MaryTTS has certain advantages compared to modern Deep Learning systems or classical, synthetic engines like [[eSpeak]]:

* The quality of the voice depends strongly on the model but can be surprisingly good, not state-of-the-art but much better than a synthetic voice.
* Audio generation is very fast and ranges from 0.2 to 0.5 [[Real-time-factor|RTF]] on a Raspberry Pi 4 depending on the selected voice<ref>https://b07z.net/downloads/tts.html</ref> meaning it can actually be used on edge devices.
* RAM consumption is moderate but you should probably reserve around 256-512 MB.
* Installation is super easy and it runs on Windows, Mac and Linux (every system that can install Java 8)
* Language support is very good: German, British and American English, French, Italian, Luxembourgish, Russian, Swedish, Telugu, Turkish and more
*Pronunciation of times, dates, temperatures etc. can be very good. MaryTTS uses an extensive, handcrafted set of rules (and statistics) to handle this.<ref>http://mary.dfki.de/documentation/overview.html</ref>

== Models ==
Voice models can be downloaded via script from inside the release version or using these links:

* [https://github.com/marytts/voice-bits1-hsmm/releases bits1-hsmm] - A female German hidden semi-Markov model voice
* [https://github.com/marytts/voice-bits3-hsmm/releases/ bits3-hsmm] - A male German hidden semi-Markov model voice
* [https://github.com/marytts/voice-cmu-slt-hsmm/ voice-cmu-slt-hsmm] (included) - A female US English hidden semi-Markov model voice
* [https://github.com/marytts/voice-cmu-bdl-hsmm/releases/ cmu-bdl-hsmm] - A male US English hidden semi-Markov model voice
* [https://github.com/marytts/voice-dfki-spike-hsmm/releases/ dfki-spike-hsmm] - A male British English hidden semi-Markov model voice
* [https://github.com/marytts/voice-dfki-prudence-hsmm/releases/ dfki-prudence-hsmm] - A female British English hidden semi-Markov model voice
* [https://github.com/marytts/voice-upmc-jessica-hsmm/releases/ upmc-jessica-hsmm] - A female French hidden semi-Markov model voice
* ...

[UNDER CONSTRUCTION]

== Installation ==
Installation is super easy:

* Install Java 8 or 11 (Debian: 'sudo apt-get install openjdk-11-jdk-headless')
* Download the release ZIP file: [https://github.com/marytts/marytts/releases v5.2 official] or [https://github.com/fquirin/marytts/releases v6.0 snpashot]
* Extract the ZIP and start the server (run scripts are in 'marytts\bin')

By default you can access the server in your browser via: http://localhost:59125/

In a production system you might want to run MaryTTS behind a [[wikipedia:Reverse_proxy|reverse proxy]] (like Nginx or Apache) to [https://github.com/SEPIA-Framework/sepia-assist-server/blob/master/Xtensions/TTS/marytts/INSTALL.md#solving-cors-problems avoid CORS issues].

== Performance ==
==== Voice: dfki-spike-hsmm en_GB ====

* Test system: Raspberry Pi4 4GB
* Sentence: "Hello this is a test"
* Run-time: 0.33 s
* Real-time-factor: 0.19

...

[UNDER CONSTRUCTION]

<references />
[[Category:TTS]]

MaryTTS

2021-11-13T12:45:40Z

Florian: Model links, releases, more info

== What is MaryTTS? ==
[http://mary.dfki.de/ Mary ('''M'''odular '''A'''rchitecture for '''R'''esearch in s'''Y'''ynthesis) Text-to-Speech] is an open-source (GNU LGPL license<ref>https://github.com/marytts/marytts/blob/master/LICENSE.md</ref>), multilingual Text-to-Speech Synthesis platform '''written in Java'''. It was originally developed as a collaborative project of [http://www.dfki.de/web DFKI’s] Language Technology Lab and the [http://www.coli.uni-saarland.de/groups/WB/Phonetics/ Institute of Phonetics] at Saarland University, Germany. It is now maintained by the Multimodal Speech Processing Group in the [https://www.mmci.uni-saarland.de/ Cluster of Excellence MMCI] and DFKI.<ref>http://mary.dfki.de/</ref>

MaryTTS has been around for a very! long time. Version 3.0 even dates back to 2006, long before Deep Learning was a broadly known term and the last official release was version 5.2 in 2016<ref>http://mary.dfki.de/download/index.html</ref>. The system uses [[wikipedia:Speech_synthesis#Unit_selection_synthesis|unit selection]] and [[wikipedia:Hidden_Markov_model|HMM]]-based techniques to build voices (today probably called AI, back then called statistics ^^). If you want to learn more about the architecture check out the [http://mary.dfki.de/documentation/index.html official documentation].

There is still activity on the GitHub page and internally there has been some major code refactoring but it is currently unclear if there will ever be another release version. There has been an unofficial snapshot release for the SEPIA Framework which runs stable on Java 11 but should be considered experimental: [https://github.com/fquirin/marytts/releases MaryTTS 6.0 snapshot] ([https://hub.docker.com/r/sepia/marytts Docker container]).

== Advantages of MaryTTS ==
MaryTTS has certain advantages compared to modern Deep Learning systems or classical, synthetic engines like [[eSpeak]]:

* The quality of the voice depends strongly on the model but can be surprisingly good, not state-of-the-art but much better than a synthetic voice.
* Audio generation is very fast and ranges from 0.2 to 0.5 [[Real-time-factor|RTF]] on a Raspberry Pi 4 depending on the selected voice<ref>https://b07z.net/downloads/tts.html</ref> meaning it can actually be used on edge devices.
* RAM consumption is moderate but you should probably reserve around 256-512 MB.
* Installation is super easy and it runs on Windows, Mac and Linux (every system that can install Java 8)
* Language support is very good: German, British and American English, French, Italian, Luxembourgish, Russian, Swedish, Telugu, Turkish and more
*Pronunciation of times, dates, temperatures etc. can be very good. MaryTTS uses an extensive, handcrafted set of rules (and statistics) to handle this.<ref>http://mary.dfki.de/documentation/overview.html</ref>

== Models ==
Voice models can be downloaded via script from inside the release version or using these links:

* [https://github.com/marytts/voice-bits1-hsmm/releases bits1-hsmm] - A female German hidden semi-Markov model voice
* [https://github.com/marytts/voice-bits3-hsmm/releases/ bits3-hsmm] - A male German hidden semi-Markov model voice
* [https://github.com/marytts/voice-cmu-slt-hsmm/ voice-cmu-slt-hsmm] (included) - A female US English hidden semi-Markov model voice
* [https://github.com/marytts/voice-cmu-bdl-hsmm/releases/ cmu-bdl-hsmm] - A male US English hidden semi-Markov model voice
* [https://github.com/marytts/voice-dfki-spike-hsmm/releases/ dfki-spike-hsmm] - A male British English hidden semi-Markov model voice
* [https://github.com/marytts/voice-dfki-prudence-hsmm/releases/ dfki-prudence-hsmm] - A female British English hidden semi-Markov model voice
* [https://github.com/marytts/voice-upmc-jessica-hsmm/releases/ upmc-jessica-hsmm] - A female French hidden semi-Markov model voice
* ...

[UNDER CONSTRUCTION]

== Installation ==
Installation is super easy:

* Install Java 8 or 11 (Debian: 'sudo apt-get install openjdk-11-jdk-headless')
* Download the release ZIP file: [https://github.com/marytts/marytts/releases v5.2 official] or [https://github.com/fquirin/marytts/releases v6.0 snpashot]
* Extract the ZIP and start the server (run scripts are in 'marytts\bin')

By default you can access the server in your browser via: http://localhost:59125/

In a production system you might want to run MaryTTS behind a [[wikipedia:Reverse_proxy|reverse proxy]] (like Nginx or Apache) to [https://github.com/SEPIA-Framework/sepia-assist-server/blob/master/Xtensions/TTS/marytts/INSTALL.md#solving-cors-problems avoid CORS issues].

== Performance ==
==== Mary-TTS - Voice: dfki-spike-hsmm en_GB ====

* Test system: Raspberry Pi4 4GB
* Sentence: "Hello this is a test"
* Run-time: 0.33 s
* Real-time-factor: 0.19

...

[UNDER CONSTRUCTION]

<references />
[[Category:TTS]]

MaryTTS

2021-11-13T12:14:47Z

Florian: Performance metrik

== What is MaryTTS? ==
[http://mary.dfki.de/ Mary ('''M'''odular '''A'''rchitecture for '''R'''esearch in s'''Y'''ynthesis) Text-to-Speech] is an open-source (GNU LGPL license<ref>https://github.com/marytts/marytts/blob/master/LICENSE.md</ref>), multilingual Text-to-Speech Synthesis platform '''written in Java'''. It was originally developed as a collaborative project of [http://www.dfki.de/web DFKI’s] Language Technology Lab and the [http://www.coli.uni-saarland.de/groups/WB/Phonetics/ Institute of Phonetics] at Saarland University, Germany. It is now maintained by the Multimodal Speech Processing Group in the [https://www.mmci.uni-saarland.de/ Cluster of Excellence MMCI] and DFKI.<ref>http://mary.dfki.de/</ref>

MaryTTS has been around for a very! long time. Version 3.0 even dates back to 2006, long before Deep Learning was a broadly known term and the last official release was version 5.2 in 2016<ref>http://mary.dfki.de/download/index.html</ref>. The system uses [[wikipedia:Speech_synthesis#Unit_selection_synthesis|unit selection]] and [[wikipedia:Hidden_Markov_model|HMM]]-based techniques to build voices (today probably called AI, back then called statistics ^^). If you want to learn more about the architecture check out the [http://mary.dfki.de/documentation/index.html official documentation].

There is still activity on the GitHub page and internally there has been some major code refactoring but it is currently unclear if there will ever be another release version. There has been an unofficial snapshot release for the SEPIA Framework which runs stable on Java 11 but should be considered experimental: [https://github.com/fquirin/marytts/releases MaryTTS 6.0 snapshot] ([https://hub.docker.com/r/sepia/marytts Docker container]).

== Advantages of MaryTTS ==
MaryTTS has certain advantages compared to modern Deep Learning systems or classical, synthetic engines like [[eSpeak]]:

* The quality of the voice depends strongly on the model but can be surprisingly good, not state-of-the-art but much better than a synthetic voice.
* Audio generation is very fast and ranges from 0.2 to 0.5 [[Real-time-factor|RTF]] on a Raspberry Pi 4 depending on the selected voice<ref>https://b07z.net/downloads/tts.html</ref> meaning it can actually be used on edge devices.
* RAM consumption is moderate but you should probably reserve around 256-512 MB.
* Installation is super easy and it runs on Windows, Mac and Linux (every system that can install Java 8)
* Language support is very good: German, British and American English, French, Italian, Luxembourgish, Russian, Swedish, Telugu, Turkish and more
*Pronunciation of times, dates, temperatures etc. can be very good. MaryTTS uses an extensive, handcrafted set of rules (and statistics) to handle this.<ref>http://mary.dfki.de/documentation/overview.html</ref>

== Models ==
[TBD]

== Installation ==
Installation is super easy:

* Install Java 8 or 11
* Download the release ZIP file
* Extract and start the server

By default you can access the server in your browser via: http://localhost:59125/

In a production system you might want to run MaryTTS behind a [[wikipedia:Reverse_proxy|reverse proxy]] (like Nginx or Apache) to [https://github.com/SEPIA-Framework/sepia-assist-server/blob/master/Xtensions/TTS/marytts/INSTALL.md#solving-cors-problems avoid CORS issues].

== Performance ==
[TBD]

==== Mary-TTS - Voice: dfki-spike-hsmm en_GB ====
Test system: Raspberry Pi4 4GB

Sentence: "Hello this is a test"

Run-time: 0.33 s

Real-time-factor: 0.19

<references />
[[Category:TTS]]

MaryTTS

2021-11-13T12:12:43Z

Florian: Added pronunciation advantage

MaryTTS

2021-11-13T12:06:02Z

Florian: Set category

MaryTTS

2021-11-13T12:02:05Z

Florian: Created MaryTTS page