Vosk: Difference between revisions

From Voice Technology Wiki
Jump to navigation Jump to search
(Created Vosk page)
 
m (Formatting improved)
 
(One intermediate revision by one other user not shown)
Line 1: Line 1:
[https://github.com/alphacep/vosk-api Vosk] is an open-source speech recognition toolkit by Alphacephei<ref>https://alphacephei.com/vosk/</ref>. Key features are:
Vosk<ref>https://github.com/alphacep/vosk-api</ref> is an open-source speech recognition toolkit by Alphacephei<ref>https://alphacephei.com/vosk/</ref>. Key features are:


# Supports 20+ languages and dialects - English, Indian English, German, French, Spanish, Portuguese, Chinese, Russian, Turkish, Vietnamese, Italian, Dutch, Catalan, Arabic, Greek, Farsi, Filipino, Ukrainian, Kazakh, Swedish. More to come.
# Supports 20+ languages and dialects - English, Indian English, German, French, Spanish, Portuguese, Chinese, Russian, Turkish, Vietnamese, Italian, Dutch, Catalan, Arabic, Greek, Farsi, Filipino, Ukrainian, Kazakh, Swedish. More to come.
Line 9: Line 9:
# Allows quick reconfiguration of vocabulary for best accuracy.
# Allows quick reconfiguration of vocabulary for best accuracy.
# Supports speaker identification beside simple speech recognition.
# Supports speaker identification beside simple speech recognition.
== Related projects ==
* https://github.com/solyarisoftware/voskjs Vosk ASR offline engine API for NodeJs developers. With a simple HTTP ASR server
== References ==
<references />


[[Category:STT]]
[[Category:STT]]
[[Category:Project]]

Latest revision as of 20:31, 3 December 2021

Vosk[1] is an open-source speech recognition toolkit by Alphacephei[2]. Key features are:

  1. Supports 20+ languages and dialects - English, Indian English, German, French, Spanish, Portuguese, Chinese, Russian, Turkish, Vietnamese, Italian, Dutch, Catalan, Arabic, Greek, Farsi, Filipino, Ukrainian, Kazakh, Swedish. More to come.
  2. Works offline, even on lightweight devices - Raspberry Pi, Android, iOS
  3. Installs with simple pip3 install vosk
  4. Portable per-language models are only 50Mb each, but there are much bigger server models available.
  5. Provides streaming API for the best user experience (unlike popular speech-recognition python packages)
  6. There are bindings for different programming languages, too - java/csharp/javascript etc.
  7. Allows quick reconfiguration of vocabulary for best accuracy.
  8. Supports speaker identification beside simple speech recognition.

Related projects[edit | edit source]

References[edit | edit source]