Skip to main content

Real-Time Latency

Transcription:Real-TimeDeployments:All

When transcribing in real-time, you can control the maximum time to wait for the final transcript using the max_delay and max_delay_mode transcription config options. You can also use enable_partials to receive Partial transcripts in just a few hundred milliseconds.

{
  "type": "transcription",
  "transcription_config": {
    "language": "en",
    "max_delay": 3.5,
    "max_delay_mode": "fixed",
    "enable_partials": true
  }
}

The max_delay parameter controls the maximum latency of Finals in the real-time transcription engine. This is the delay in seconds between receiving input audio and returning Final transcription results. The default is 10. The minimum and maximum values are 2 and 20. Note that max_delay has no impact on how Partials are returned.

Max delay mode

Using a fixed value of max_delay can increase the potential for inaccuracies in the transcript, especially around entities such as numerals, currencies, and dates.

Flexible max_delay_mode allows greater flexibility in the maximum latency only when a potential entity has been detected. Entities are common concepts such as numbers, currencies and dates, and are discussed in more detail here.

There are two options for max_delay_mode: fixed and flexible. The default is flexible.

  • flexible improves accuracy in entity recognition by allowing the latency to exceed the max_delay threshold when a potential entity is detected
  • fixed ensures that final transcripts never take longer than the max_delay threshold, even if this results in less accurate transcription of entities

Partial transcripts

Partial transcripts are enabled using the enable_partials config option. Partials allow your users to see updates quicker than the 2 second lower limit of max_delay. Typically these are returned in 500-800 milliseconds.

When Partial transcripts are enabled, Final transcripts will still be returned. Partials will be updated as more audio is received and further context is understood. This improves the accuracy up until a Final transcript is generated for that section of audio. Once a Final is received, the partials are reset to empty.

Note that Partial Transcripts have some limitations:

  • Accuracy is usually 10-25% lower than the Final Transcript. This includes lower accuracy of punctuation and capitalisation of words.
  • Numeral Formatting is not returned in Partial Transcripts
  • Diarization is not returned in Partial Transcripts
  • The confidence field for Partial transcripts has no meaning and should not be relied on.