A good wake word: Difference between revisions
m (Adding category STT) |
m (Hmmm, i always forget references headline.) |
||
(One intermediate revision by the same user not shown) | |||
Line 1: | Line 1: | ||
[[Category:Wake words]] | [[Category:Wake words]] | ||
[[Category:STT]] | [[Category:STT]] | ||
[[Category:Lang-multi]] | |||
== What makes a good wake word? == | |||
Several factors contribute to the overall quality and usability of a wake word. Ideally it is a | |||
* short (''over four syllables become clunky in usage'') | |||
* memorable | |||
* easily pronounced phrase | |||
* uncommon phoneme arrangement | |||
Most people are familiar with "''Hey Google''" or "''Hey Siri''" or "''Hey Mycroft''", for instance. "''Alexa''" is a single-word wake word, albeit one with an uncommon with uncommon phoneme arrangement. None of these are more than three syllables. In "Star Trek" the ship computers are frequently addressed by "''Computer''". This is a reasonable choice, but not without some drawbacks. The word computer can come up frequently in modern daily conversations. Its phoneme arrangement is not uncommon, potential leading to false activations. "''OK Computer''" or "''Hey computer''" add a bit more complexity to the wake word. | |||
If your wake word has a lot of rhyme potential, this is a sign it's probably going to have a lot of potential false activations. This can be partially mitigated with a quality dataset. The cadence of the wake word should also be as unique as possible. "Hey Mycroft" frequently has false activations when someone says "Microsoft". For these, finding cadence patterns and matching that to other phrases/words to build the dataset. The pronunciation of your wake word would ideally be smooth. Adding "hey" to the front of things is easy choice as the -ey phoneme leaves the mouth fairly neutral and ready to say another phoneme. Words like "axlotl" are definitely uncommon and unique, but not as smooth or easy to pronounce. | == Wake word tips == | ||
While this document is primarily about English, the concepts can be applied to many other languages. | |||
=== English === | |||
Modern English contains 44 phonemes: 24 consonants and 20 vowels. The vowel sounds have two groups based on mouth position, monophthongs and diphthongs. Simplest way to tell them apart is the e in "thee" is a monophthong, and the -oy in "boy" is a diphthong. The most common American English phoneme is the schwa, which has an "uh" sound. It can be part of any vowel's pronunciation, like the first a in "Alexa" or the e in "the". A 2004 paper<ref>https://journals.sagepub.com/doi/pdf/10.1207/s15548430jlr3601_5</ref> contains a helpful list of the most common phonemes. CMU also has their handy pronouncing dictionary<ref>http://www.speech.cs.cmu.edu/cgi-bin/cmudict</ref> to help decipher phonemes. The CMU dictionary file can also be found here<ref>https://github.com/Alexir/CMUdict/blob/master/cmudict-0.7b</ref>. Utilizing those and other resources you can then start to determine the viability of your wake word's phoneme structure. | |||
If your wake word has a lot of rhyme potential, this is a sign it's probably going to have a lot of potential false activations. This can be partially mitigated with a quality dataset. The cadence of the wake word should also be as unique as possible. "''Hey Mycroft''" frequently has false activations when someone says "Microsoft". For these, finding cadence patterns and matching that to other phrases/words to build the dataset. The pronunciation of your wake word would ideally be smooth. Adding "hey" to the front of things is easy choice as the -ey phoneme leaves the mouth fairly neutral and ready to say another phoneme. Words like "axlotl" are definitely uncommon and unique, but not as smooth or easy to pronounce. | |||
The accuracy of your listening engine in conjunction with your wake word should minimize false negatives for your target audience first, and minimize false positives second. | The accuracy of your listening engine in conjunction with your wake word should minimize false negatives for your target audience first, and minimize false positives second. | ||
=== TODO: Other languages === | |||
== References == |
Latest revision as of 22:40, 17 November 2021
What makes a good wake word?[edit | edit source]
Several factors contribute to the overall quality and usability of a wake word. Ideally it is a
- short (over four syllables become clunky in usage)
- memorable
- easily pronounced phrase
- uncommon phoneme arrangement
Most people are familiar with "Hey Google" or "Hey Siri" or "Hey Mycroft", for instance. "Alexa" is a single-word wake word, albeit one with an uncommon with uncommon phoneme arrangement. None of these are more than three syllables. In "Star Trek" the ship computers are frequently addressed by "Computer". This is a reasonable choice, but not without some drawbacks. The word computer can come up frequently in modern daily conversations. Its phoneme arrangement is not uncommon, potential leading to false activations. "OK Computer" or "Hey computer" add a bit more complexity to the wake word.
Wake word tips[edit | edit source]
While this document is primarily about English, the concepts can be applied to many other languages.
English[edit | edit source]
Modern English contains 44 phonemes: 24 consonants and 20 vowels. The vowel sounds have two groups based on mouth position, monophthongs and diphthongs. Simplest way to tell them apart is the e in "thee" is a monophthong, and the -oy in "boy" is a diphthong. The most common American English phoneme is the schwa, which has an "uh" sound. It can be part of any vowel's pronunciation, like the first a in "Alexa" or the e in "the". A 2004 paper[1] contains a helpful list of the most common phonemes. CMU also has their handy pronouncing dictionary[2] to help decipher phonemes. The CMU dictionary file can also be found here[3]. Utilizing those and other resources you can then start to determine the viability of your wake word's phoneme structure.
If your wake word has a lot of rhyme potential, this is a sign it's probably going to have a lot of potential false activations. This can be partially mitigated with a quality dataset. The cadence of the wake word should also be as unique as possible. "Hey Mycroft" frequently has false activations when someone says "Microsoft". For these, finding cadence patterns and matching that to other phrases/words to build the dataset. The pronunciation of your wake word would ideally be smooth. Adding "hey" to the front of things is easy choice as the -ey phoneme leaves the mouth fairly neutral and ready to say another phoneme. Words like "axlotl" are definitely uncommon and unique, but not as smooth or easy to pronounce.
The accuracy of your listening engine in conjunction with your wake word should minimize false negatives for your target audience first, and minimize false positives second.