Open Voice Technology Wiki - User contributions [en]

A good wake word

2021-11-10T09:38:06Z

72.128.154.254:

What makes a good wake word?

Several factors contribute to the overall quality and usability of a wake word. Ideally it is a short, memorable, easily pronounced phrase, with uncommon phoneme arrangement. Most people are familiar with "Hey Google" or "Hey Siri" or "Hey Mycroft", for instance. "Alexa" is a single-word wake word, albeit one with an uncommon with uncommon phoneme arrangement. None of these are more than three syllables. In "Star Trek" the ship computers are frequently addressed by "Computer". This is a reasonable choice, but not without some drawbacks. The word computer can come up frequently in modern daily conversations. Its phoneme arrangement is not uncommon, potential leading to false activations. "OK Computer" or "Hey computer" add a bit more complexity to the wake word. Having a wake word over four or potentially five syllables starts to become clunky in its usage.

While this document is primarily about English, the concepts can be applied to many other languages.

Modern English contains 44 phonemes: 24 consonants and 20 vowels. The vowel sounds have two groups based on mouth position, monophthongs and diphthongs. Simplest way to tell them apart is the e in "thee" is a monophthong, and the -oy in "boy" is a diphthong. The most common American English phoneme is the schwa, which has an "uh" sound. It can be part of any vowel's pronunciation, like the first a in "Alexa" or the e in "the". A [https://journals.sagepub.com/doi/pdf/10.1207/s15548430jlr3601_5 2004 paper] contains a helpful list of the most common phonemes. CMU also has their handy [pronouncing dictionary](http://www.speech.cs.cmu.edu/cgi-bin/cmudict) to help decipher phonemes. The CMU dictionary file can also be found [https://github.com/Alexir/CMUdict/blob/master/cmudict-0.7b here]. Utilizing those and other resources you can then start to determine the viability of your wake word's phoneme structure.

If your wake word has a lot of rhyme potential, this is a sign it's probably going to have a lot of potential false activations. This can be partially mitigated with a quality dataset. The cadence of the wake word should also be as unique as possible. "Hey Mycroft" frequently has false activations when someone says "Microsoft". For these, finding cadence patterns and matching that to other phrases/words to build the dataset. The pronunciation of your wake word would ideally be smooth. Adding "hey" to the front of things is easy choice as the -ey phoneme leaves the mouth fairly neutral and ready to say another phoneme. Words like "axlotl" are definitely uncommon and unique, but not as smooth or easy to pronounce.

The accuracy of your listening engine in conjunction with your wake word should minimize false negatives for your target audience first, and minimize false positives second.

Mycroft Precise

2021-11-10T09:35:39Z

72.128.154.254:

[[Category:Mycroft]]
[[Category:Open Voice Assistants]]
[[Category:Wake words]]

Mycroft Precise is a simple to use [[:Category:Wake words|wake word listener]] created by [[Mycroft]].

Precise can be used to model custom wake words. They have a good [https://github.com/MycroftAI/mycroft-precise/wiki/Training-your-own-wake-word#how-to-train-your-own-wake-word documentation page] on that, with additional tips and tricks from other [[Mycroft]] community members [https://github.com/el-tocino/localcroft/blob/master/precise/Precise.md here] and [https://github.com/sparky-vision/mycroft-precise-tips here].

A good wake word

2021-11-10T09:29:37Z

72.128.154.254:

A good wake word

2021-11-10T09:28:23Z

72.128.154.254: Created page with "What makes a good wake word? Several factors contribute to the overall quality and usability of a wake word. Ideally it is a short, memorable, easily pronounced phrase, with..."

What makes a good wake word?

Several factors contribute to the overall quality and usability of a wake word. Ideally it is a short, memorable, easily pronounced phrase, with uncommon phoneme arrangement. Most people are familiar with "Hey Google" or "Hey Siri" or "Hey Mycroft", for instance. "Alexa" is a single-word wake word, albeit one with an uncommon with uncommon phoneme arrangement. None of these are more than three syllables. In "Star Trek" the ship computers are frequently addressed by "Computer". This is a reasonable choice, but not without some drawbacks. The word computer can come up frequently in modern daily conversations. Its phoneme arrangement is not uncommon, potential leading to false activations. "OK Computer" or "Hey computer" add a bit more complexity to the wake word. Having a wake word over four or potentially five syllables starts to become clunky in its usage.

While this document is primarily about English, the concepts can be applied to many other languages.

Modern English contains 44 phonemes: 24 consonants and 20 vowels. The vowel sounds have two groups based on mouth position, monophthongs and diphthongs. Simplest way to tell them apart is the e in "thee" is a monophthong, and the -oy in "boy" is a diphthong. The most common American English phoneme is the schwa, which has an "uh" sound. It can be part of any vowel's pronunciation, like the first a in "Alexa" or the e in "the". A [2004 paper](https://journals.sagepub.com/doi/pdf/10.1207/s15548430jlr3601_5) contains a helpful list of the most common phonemes. CMU also has their handy [pronouncing dictionary](http://www.speech.cs.cmu.edu/cgi-bin/cmudict) to help decipher phonemes. The CMU dictionary file can also be found [here](https://github.com/Alexir/CMUdict/blob/master/cmudict-0.7b). Utilizing those and other resources you can then start to determine the viability of your wake word's phoneme structure.

If your wake word has a lot of rhyme potential, this is a sign it's probably going to have a lot of potential false activations. This can be partially mitigated with a quality dataset. The cadence of the wake word should also be as unique as possible. "Hey Mycroft" frequently has false activations when someone says "Microsoft". For these, finding cadence patterns and matching that to other phrases/words to build the dataset. The pronunciation of your wake word would ideally be smooth. Adding "hey" to the front of things is easy choice as the -ey phoneme leaves the mouth fairly neutral and ready to say another phoneme. Words like "axlotl" are definitely uncommon and unique, but not as smooth or easy to pronounce.

Mycroft Mimic

2021-11-10T08:12:01Z

72.128.154.254: /* Build Mimic on device */

[[Category:Open Voice Assistants]]
[[Category:Mycroft]]
[[Category:TTS]]

==What's Mycroft Mimic==
It is a TTS system delivered with Mycroft (''by Nov 2021 it's english only''). It's available in two installation types:

*Mimic (''version 1'') runs locally on device and has an understable but not great quality. But runs on small computer devices like a Raspberry Pi. It's by default the "fallback" if other voices are not available.
*Mimic (version 2) is provided by Mycroft AI server/cloud backend infrastructure and offers a better voice quality (''english only'').

==Tips & Tricks==

===Build Mimic 1 on device===
While setting up a Mycroft device installer will ask to build Mimic 1 locally. If you initially skipped this you can also re-run dev_setup.sh with a -fm flag for "force mimic".<blockquote>./dev_setup.sh -fm</blockquote>You can do the opposite, by running flag to "skip mimic":<blockquote>./dev_setup.sh -sm</blockquote>

Category:Wake words

2021-11-10T08:03:12Z

72.128.154.254:

[[Category:STT]]
[[Category:Open Voice Assistants]]

Wake words, sometimes called key words, are a special category of Speech-To-Text. Wake words are used to "wake" a listening device and start its functions. In most cases these "wake words" are detected locally on devices while actual speech recognition is mostly done by internet cloud services. Mycroft defaults to "Hey, Mycroft" for its wake word, for instance. Some platforms allow for multiple wake words to be used. Coqui STT engine can even be configured as a wake word listener.

'''Wake word listeners''':

* [[Mycroft Precise]]
* [[Porcupine]]
* [[Snowboy]]
* [[Howl]]
* [[Coqui]] STT
* Google tensorflow lite speech recognition

'''Customizing wake words'''

* What makes [[a good wake word]]?
* Building a wake word dataset

Category:Wake words

2021-11-09T08:17:53Z

72.128.154.254:

Wake words, sometimes called key words, are a special category of Speech-To-Text. Wake words are used to "wake" a listening device and start its functions. Mycroft defaults to "Hey, Mycroft" for its wake word, for instance. Some platforms allow for multiple wake words to be used. Coqui STT engine can even be configured as a wake word listener.

Wake word listeners:

* Mycroft Precise
* Porcupine
* Snowboy
* Howl
*Coqui STT
*Google tensorflow lite speech recognition

Customizing wake words

* What makes a good wake word?
* Building a wake word dataset

Category:Wake words

2021-11-09T08:14:27Z

72.128.154.254: Add wake words.

Wake words are a special category of Speech-To-Text. Wake words are used to "wake" a listening device and start its functions. Mycroft defaults to "Hey, Mycroft" for its wake word.

Wake word listeners:

* Mycroft Precise
* Porcupine
* Snowboy
* Howl
* ?

Customizing wake words

* What makes a good wake word?
* Building a wake word dataset

Recording tipps

2021-11-09T08:09:02Z

72.128.154.254:

[[Category:Recording tipps]]
[[Category:Lessons learned]]

When you plan to record a voice dataset to be used for a TTS model training you should check these tipps and tricks:

* '''Use a good microphone and a quiet recording room setup''' (no computers fans, air conditioning, ...)
* Use a text corpus with cleaned numbers/abbreviations and good phoneme coverage
* Read neutral, but with a natural speech flow and do not swallow up letters
* Adjust tone and pitch with punctuation
* Use a constant recording speed
* Check your recordings regularly in high volume for background noise
* Make breaks regularly and do not record more than four hours a day
* Record error free
* Investing in a quality interface and mic can make a big difference in quality. A 24 bit 96khz interface with a large diaphragm condenser can be had for about $200 USD.
* Record at the highest quality level practical. You can convert to lesser formats later, but you can't up convert cleanly
* Review your work at regular intervals and compare with previous recording to ensure consistent quality
* Do not be afraid to ask for help! Getting feedback on your data early on can help prevent wasted effort.