A good wake word

Revision as of 22:35, 17 November 2021 by Thorsten (talk | contribs) (Style and category adjustments.)


What makes a good wake word?

Several factors contribute to the overall quality and usability of a wake word. Ideally it is a

  • short (over four syllables become clunky in usage)
  • memorable
  • easily pronounced phrase
  • uncommon phoneme arrangement

Most people are familiar with "Hey Google" or "Hey Siri" or "Hey Mycroft", for instance. "Alexa" is a single-word wake word, albeit one with an uncommon with uncommon phoneme arrangement. None of these are more than three syllables. In "Star Trek" the ship computers are frequently addressed by "Computer". This is a reasonable choice, but not without some drawbacks. The word computer can come up frequently in modern daily conversations. Its phoneme arrangement is not uncommon, potential leading to false activations. "OK Computer" or "Hey computer" add a bit more complexity to the wake word.

Wake word tips

While this document is primarily about English, the concepts can be applied to many other languages.

English

Modern English contains 44 phonemes: 24 consonants and 20 vowels. The vowel sounds have two groups based on mouth position, monophthongs and diphthongs. Simplest way to tell them apart is the e in "thee" is a monophthong, and the -oy in "boy" is a diphthong. The most common American English phoneme is the schwa, which has an "uh" sound. It can be part of any vowel's pronunciation, like the first a in "Alexa" or the e in "the". A 2004 paper[1] contains a helpful list of the most common phonemes. CMU also has their handy pronouncing dictionary[2] to help decipher phonemes. The CMU dictionary file can also be found here[3]. Utilizing those and other resources you can then start to determine the viability of your wake word's phoneme structure.

If your wake word has a lot of rhyme potential, this is a sign it's probably going to have a lot of potential false activations. This can be partially mitigated with a quality dataset. The cadence of the wake word should also be as unique as possible. "Hey Mycroft" frequently has false activations when someone says "Microsoft". For these, finding cadence patterns and matching that to other phrases/words to build the dataset. The pronunciation of your wake word would ideally be smooth. Adding "hey" to the front of things is easy choice as the -ey phoneme leaves the mouth fairly neutral and ready to say another phoneme. Words like "axlotl" are definitely uncommon and unique, but not as smooth or easy to pronounce.

The accuracy of your listening engine in conjunction with your wake word should minimize false negatives for your target audience first, and minimize false positives second.

TODO: Other languages