A good wake word

From Voice Technology Wiki
Revision as of 10:38, 10 November 2021 by 72.128.154.254 (talk)
Jump to navigation Jump to search

What makes a good wake word?

Several factors contribute to the overall quality and usability of a wake word. Ideally it is a short, memorable, easily pronounced phrase, with uncommon phoneme arrangement. Most people are familiar with "Hey Google" or "Hey Siri" or "Hey Mycroft", for instance. "Alexa" is a single-word wake word, albeit one with an uncommon with uncommon phoneme arrangement. None of these are more than three syllables. In "Star Trek" the ship computers are frequently addressed by "Computer". This is a reasonable choice, but not without some drawbacks. The word computer can come up frequently in modern daily conversations. Its phoneme arrangement is not uncommon, potential leading to false activations. "OK Computer" or "Hey computer" add a bit more complexity to the wake word. Having a wake word over four or potentially five syllables starts to become clunky in its usage.

While this document is primarily about English, the concepts can be applied to many other languages.

Modern English contains 44 phonemes: 24 consonants and 20 vowels. The vowel sounds have two groups based on mouth position, monophthongs and diphthongs. Simplest way to tell them apart is the e in "thee" is a monophthong, and the -oy in "boy" is a diphthong. The most common American English phoneme is the schwa, which has an "uh" sound. It can be part of any vowel's pronunciation, like the first a in "Alexa" or the e in "the". A 2004 paper contains a helpful list of the most common phonemes. CMU also has their handy [pronouncing dictionary](http://www.speech.cs.cmu.edu/cgi-bin/cmudict) to help decipher phonemes. The CMU dictionary file can also be found here. Utilizing those and other resources you can then start to determine the viability of your wake word's phoneme structure.

If your wake word has a lot of rhyme potential, this is a sign it's probably going to have a lot of potential false activations. This can be partially mitigated with a quality dataset. The cadence of the wake word should also be as unique as possible. "Hey Mycroft" frequently has false activations when someone says "Microsoft". For these, finding cadence patterns and matching that to other phrases/words to build the dataset. The pronunciation of your wake word would ideally be smooth. Adding "hey" to the front of things is easy choice as the -ey phoneme leaves the mouth fairly neutral and ready to say another phoneme. Words like "axlotl" are definitely uncommon and unique, but not as smooth or easy to pronounce.

The accuracy of your listening engine in conjunction with your wake word should minimize false negatives for your target audience first, and minimize false positives second.