Real-time-factor: Difference between revisions
(Added categories) |
(typo on the word "length") |
||
Line 6: | Line 6: | ||
Usually a state of the art speech-to-text cloud-based service supplied by Google, Azure, AWS, etc. has values between 0.2 and 0.6. Note that is all very depending on many factors, the network/internet bandwith, the speech content, etc. In case of an on-prem ASR, the major impacting factor is the algorithm and the hardware resources (CPU/RAM). <syntaxhighlight lang="python"> | Usually a state of the art speech-to-text cloud-based service supplied by Google, Azure, AWS, etc. has values between 0.2 and 0.6. Note that is all very depending on many factors, the network/internet bandwith, the speech content, etc. In case of an on-prem ASR, the major impacting factor is the algorithm and the hardware resources (CPU/RAM). <syntaxhighlight lang="python"> | ||
def real_time_factor(processingTime, | def real_time_factor(processingTime, audioLength, decimals=2): | ||
''' Real-Time Factor (RTF) is defined as processing-time / length-of-audio. ''' | ''' Real-Time Factor (RTF) is defined as processing-time / length-of-audio. ''' | ||
rtf = (processingTime / | rtf = (processingTime / audioLength) | ||
return round(rtf, decimals) | return round(rtf, decimals) |
Latest revision as of 19:57, 18 January 2024
The real time factor (RTF) is a common metric of measuring the speed of an automatic speech recognition system (ASR) in the decoding phase ("at run-time"). It can also be used in other context where an audio or video signal is processed (usually automatically) at nearly constant rate. All in all RTF is a measure of the latency of any (audio) processing system, not only a speech recognition engine, but also a text-to-speech engine, a transcoding engine, etc.
If it takes time f(d) to process an input of duration d , the real time factor is defined as: RTF = f(d)/d
If, for example, it takes 8 hours of computation time to process a recording of duration 2 hours, the real time factor is 4. When the real time factor is 1, the processing is done "in real time". It is a hardware dependent value, it is a network bandwidth dependent value (this is important to note, if processing is done as cloud-based service).
Usually a state of the art speech-to-text cloud-based service supplied by Google, Azure, AWS, etc. has values between 0.2 and 0.6. Note that is all very depending on many factors, the network/internet bandwith, the speech content, etc. In case of an on-prem ASR, the major impacting factor is the algorithm and the hardware resources (CPU/RAM).
def real_time_factor(processingTime, audioLength, decimals=2):
''' Real-Time Factor (RTF) is defined as processing-time / length-of-audio. '''
rtf = (processingTime / audioLength)
return round(rtf, decimals)