Real-Time Latency
Transcription:Real-TimeDeployments:AllWhen transcribing in real-time, you can control the maximum time to wait for the final transcript using the max_delay
and max_delay_mode
transcription config options. You can also use enable_partials
to receive Partial transcripts in just a few hundred milliseconds.
{
"type": "transcription",
"transcription_config": {
"language": "en",
"max_delay": 3.5,
"max_delay_mode": "fixed",
"enable_partials": true
}
}
The max_delay
parameter controls the maximum latency of Finals in the real-time transcription engine. This is the delay in seconds between receiving input audio and returning Final transcription results. The default is 10. The minimum and maximum values are 2 and 20. Note that max_delay
has no impact on how Partials are returned.
Max delay mode
Using a fixed value of max_delay
can increase the potential for inaccuracies in the transcript, especially around entities such as numerals, currencies, and dates.
Flexible max_delay_mode
allows greater flexibility in the maximum latency only when a potential entity has been detected. Entities are common concepts such as numbers, currencies and dates, and are discussed in more detail here.
There are two options for max_delay_mode
: fixed
and flexible
. The default is flexible
.
flexible
improves accuracy in entity recognition by allowing the latency to exceed themax_delay
threshold when a potential entity is detectedfixed
ensures that final transcripts never take longer than themax_delay
threshold, even if this results in less accurate transcription of entities
Partial transcripts
Partial transcripts are enabled using the enable_partials
config option. Partials allow your users to see updates quicker than the 2 second lower limit of max_delay
. Typically these are returned in 500-800 milliseconds.
When Partial transcripts are enabled, Final transcripts will still be returned. Partials will be updated as more audio is received and further context is understood. This improves the accuracy up until a Final transcript is generated for that section of audio. Once a Final is received, the partials are reset to empty.
Note that Partial Transcripts have some limitations:
- Accuracy is usually 10-25% lower than the Final Transcript. This includes lower accuracy of punctuation and capitalisation of words.
- Numeral Formatting is not returned in Partial Transcripts
- Diarization is not returned in Partial Transcripts
- The
confidence
field for Partial transcripts has no meaning and should not be relied on.