Skip to main content
Speech to TextRealtime Transcription

End of Turn Detection

Learn how Speechmatics detects end of utterances

To improve user experience in responsive real-time scenarios it important to know when a person has finished speaking. This is especially important for voice AI, translation, and dictation use cases. Detecting an 'End of Turn' can be used to trigger actions such as generating a response in a Voice AI agent.

To get started, check out the Configuration Example below.

Use Cases

Voice AI & Conversational Systems: Enable voice assistants and chatbots to detect when the user has finished speaking, allowing the system to respond promptly without awkward delays.

Real-time Translation: Critical for live interpretation services where translations need to be delivered as soon as the speaker completes their thought, maintaining the flow of conversation.

Dictation & Transcription: Helps dictation software determine when users have completed their input, improving speed of final transcription and user experience.

End of Utterance Configuration

Speechmatics' Speech-To-Text allows you to use a period of silence to determine when a user has finished speaking. This is known as End of Utterance detection and is one way to detect End of Turn.

To enable End of Utterance detection, include the following in the StartRecognition message:

{
"type": "transcription",
"transcription_config": {
"conversation_config": {
"end_of_utterance_silence_trigger": 0.5
},
"language": "en",
}
}
  • end_of_utterance_silence_trigger (Number): Allowed between 0 and 2 seconds. Setting to 0 seconds disables detection. This is the number of seconds of non-speech (silence) to wait before an End of Utterance is identified. When this happens, speechmatics will send a Final transcript message, followed by an extra EndOfUtterance message

Notes

  • We recommend 0.5-0.8 seconds for most voice AI applications. Longer values (0.8-1.2s) may be better for dictation applications.
  • Keep the end_of_utterance_silence_trigger lower than the max_delay value.
  • EndOfUtterance messages are only sent after some speech is recognised and duplicate EndOfUtterance messages will never be sent for the same period of silence.
  • The EndOfUtterance message is not related to any specific individual identified by Diarization and will not contain speaker information.

Example End of Utterance Message

{
"message": "EndOfUtterance",
"format": "2.9",
"metadata": {
"start_time": 1.07,
"end_time": 1.07
}
}

Semantic End of Turn

While silence-based End of Utterance is enough for many use cases, it is often improved by combining it with the context of the conversation. This is known as 'Semantic End of Turn Detection'. You can try Semantic End of Turn right away with our free Flow service demo!

Semantic End of Turn comes already included in Flow to provide the best experience for your users. You can also check out our Semantic End-of-Turn detection "how to" guide for more details on how to implement this in your own application.

Code Examples

Real-time streaming from microphone - ideal for voice AI applications.

import speechmatics
import pyaudio
import threading
import time
import asyncio

API_KEY = "YOUR_API_KEY"
LANGUAGE = "en"
CONNECTION_URL = f"wss://eu2.rt.speechmatics.com/v2"

# Audio recording parameters
SAMPLE_RATE = 16000
CHUNK_SIZE = 1024
FORMAT = pyaudio.paFloat32


class AudioProcessor:
def __init__(self):
self.wave_data = bytearray()
self.read_offset = 0

async def read(self, chunk_size):
while self.read_offset + chunk_size > len(self.wave_data):
await asyncio.sleep(0.001)

new_offset = self.read_offset + chunk_size
data = self.wave_data[self.read_offset : new_offset]
self.read_offset = new_offset
return data

def write_audio(self, data):
self.wave_data.extend(data)


class VoiceAITranscriber:
def __init__(self):
self.ws = speechmatics.client.WebsocketClient(
speechmatics.models.ConnectionSettings(
url=CONNECTION_URL,
auth_token=API_KEY,
)
)
self.audio = pyaudio.PyAudio()
self.stream = None
self.is_recording = False
self.audio_processor = AudioProcessor()

# Set up event handlers
self.ws.add_event_handler(
event_name=speechmatics.models.ServerMessageType.AddPartialTranscript,
event_handler=self.handle_partial_transcript,
)

self.ws.add_event_handler(
event_name=speechmatics.models.ServerMessageType.AddTranscript,
event_handler=self.handle_final_transcript,
)

self.ws.add_event_handler(
event_name=speechmatics.models.ServerMessageType.EndOfUtterance,
event_handler=self.handle_end_of_utterance,
)

def handle_partial_transcript(self, msg):
transcript = msg["metadata"]["transcript"]
print(f"[Listening...] {transcript}")

def handle_final_transcript(self, msg):
transcript = msg["metadata"]["transcript"]
print(f"[Complete] {transcript}")

def handle_end_of_utterance(self, msg):
print("🔚 End of utterance detected - ready for AI response!")
# This is where your voice AI would process the complete utterance
# and generate a response

def stream_callback(self, in_data, frame_count, time_info, status):
self.audio_processor.write_audio(in_data)
return in_data, pyaudio.paContinue

def start_streaming(self):
try:
# Set up pyaudio stream with callback
self.stream = self.audio.open(
format=FORMAT,
channels=1,
rate=SAMPLE_RATE,
input=True,
frames_per_buffer=CHUNK_SIZE,
stream_callback=self.stream_callback,
)

# Configure audio settings
settings = speechmatics.models.AudioSettings()
settings.encoding = "pcm_f32le"
settings.sample_rate = SAMPLE_RATE
settings.chunk_size = CHUNK_SIZE

# Configure transcription with end-of-utterance detection

conversation_config = speechmatics.models.ConversationConfig(
end_of_utterance_silence_trigger=0.75
) # Adjust as needed

conf = speechmatics.models.TranscriptionConfig(
operating_point="enhanced",
language=LANGUAGE,
enable_partials=True,
max_delay=1,
conversation_config=conversation_config,
)

print("🎤 Voice AI ready - start speaking!")
print("Press Ctrl+C to stop...")

# Start transcription using the working approach
self.ws.run_synchronously(
transcription_config=conf,
stream=self.audio_processor,
audio_settings=settings,
)

except KeyboardInterrupt:
print("\n🛑 Stopping voice AI transcriber...")
except Exception as e:
print(f"Error in transcription: {e}")
finally:
self.stop_streaming()

def stop_streaming(self):
self.is_recording = False
if self.stream:
self.stream.stop_stream()
self.stream.close()
self.audio.terminate()


# Usage
if __name__ == "__main__":
transcriber = VoiceAITranscriber()
transcriber.start_streaming()