SEPIA Speech-To-Text Server: Difference between revisions

From Voice Technology Wiki
Jump to navigation Jump to search
(Created SEPIA STT Server page)
 
(optimized format)
 
(One intermediate revision by the same user not shown)
Line 14: Line 14:
* Fast enough to run even on Raspberry Pi 4 (2GB) in real-time (depending on engine and model configuration)
* Fast enough to run even on Raspberry Pi 4 (2GB) in real-time (depending on engine and model configuration)
* Compatible to [[SEPIA]] Framework client (v0.24+)
* Compatible to [[SEPIA]] Framework client (v0.24+)


[[Category:STT]]
[[Category:STT]]

Latest revision as of 11:33, 3 December 2021

SEPIA Speech-To-Text (STT) Server is a WebSocket based, full-duplex Python server for real-time automatic speech recognition (ASR) supporting multiple open-source ASR engines. It can receive a stream of audio chunks via the secure WebSocket connection and return transcribed text almost immediately as partial and final results.

One goal of this project is to offer a standardized, secure, real-time interface for all the great open-source ASR tools out there. The server works on all major platforms including single-board devices like Raspberry Pi (4).[1]

Features[edit | edit source]

  • WebSocket server (Python Fast-API) that can receive audio streams and send transcribed text at the same time
  • Modular architecture to support multiple ASR engines like Vosk (reference implementation), Coqui, Deepspeech, Scribosermo and more (under construction)
  • Optional post processing of result (e.g. via text2num and custom modules)
  • Standardized API for all engines and support for individual engine features (speaker identification, grammar, confidence score, word timestamps, alternative results, etc.)
  • On-the-fly server and engine configuration via HTTP REST API and WebSocket 'welcome' event (including custom grammar, if supported by engine and model)
  • User authentication via simple common token or individual tokens for multiple users
  • Docker containers with support for all major platform architectures: x86 64Bit (amd64), ARM 32Bit (armv7l) and ARM 64Bit (aarch64)
  • Fast enough to run even on Raspberry Pi 4 (2GB) in real-time (depending on engine and model configuration)
  • Compatible to SEPIA Framework client (v0.24+)