Skip to main content

Real-Time Latency


When transcribing in real-time, you can control the maximum time to wait for the final transcript using the max_delay and max_delay_mode transcription config options.

  "type": "transcription",
  "transcription_config": {
    "language": "en",
    "max_delay": 3.5,
    "max_delay_mode": "fixed"

The max_delay parameter controls the maximum latency of Finals in the real-time transcription engine. This is the delay in seconds between receiving input audio and returning Final transcription results. The default is 10. The minimum and maximum values are 2 and 20. Note that max_delay has no impact on how Partials are returned.

Entities and Flexible max_delay_mode

Using a fixed value of max_delay can increase the potential for inaccuracies in the transcript, especially around entities such as numerals, currencies, and dates.

Flexible max_delay_mode allows greater flexibility in the maximum latency only when a potential entity has been detected. Entities are common concepts such as numbers, currencies and dates, and are discussed in more detail here.

There are two options for max_delay_mode: fixed and flexible. The default is flexible.

  • flexible improves accuracy in entity recognition by allowing the latency to exceed the max_delay threshold when a potential entity is detected
  • fixed ensures that final transcripts never take longer than the max_delay threshold, even if this results in less accurate transcription of entities