End of Turn Detection
Transcription:Real-TimeDeployments:AllTo improve user experience in responsive real-time scenarios it important to know when a person has finished speaking. This is especially important for voice AI, translation, and dictation use cases. Detecting an 'End of Turn' can be used to trigger actions such as generating a response in a Voice AI agent.
To get started, check out the Configuration Example below.
Use Cases
Voice AI & Conversational Systems: Enable voice assistants and chatbots to detect when the user has finished speaking, allowing the system to respond promptly without awkward delays.
Real-time Translation: Critical for live interpretation services where translations need to be delivered as soon as the speaker completes their thought, maintaining the flow of conversation.
Dictation & Transcription: Helps dictation software determine when users have completed their input, improving speed of final transcription and user experience.
End of Utterance Configuration
Speechmatics' Speech-To-Text allows you to use a period of silence to determine when a user has finished speaking. This is known as End of Utterance detection and is one way to detect End of Turn.
To enable End of Utterance detection, include the following in the StartRecognition message:
{
"type": "transcription",
"transcription_config": {
"conversation_config": {
"end_of_utterance_silence_trigger": 0.5
},
"language": "en",
}
}
end_of_utterance_silence_trigger
(Number): Allowed between 0 and 2 seconds. Setting to 0 seconds disables detection. This is the number of seconds of non-speech (silence) to wait before an End of Utterance is identified. When this happens, speechmatics will send aFinal
transcript message, followed by an extraEndOfUtterance
message
Notes
- We recommend 0.5-0.8 seconds for most voice AI applications. Longer values (0.8-1.2s) may be better for dictation applications.
- Keep the
end_of_utterance_silence_trigger
lower than the max_delay value. EndOfUtterance
messages are only sent after some speech is recognised and duplicateEndOfUtterance
messages will never be sent for the same period of silence.- The
EndOfUtterance
message is not related to any specific individual identified by Diarization and will not contain speaker information.
Example End of Utterance Message
{
"message": "EndOfUtterance",
"format": "2.9",
"metadata": {
"start_time": 1.07,
"end_time": 1.07
}
}
Semantic End of Turn
While silence-based End of Utterance is enough for many use cases, it is often improved by combining it with the context of the conversation. This is known as 'Semantic End of Turn Detection'. You can try Semantic End of Turn right away with our free Flow service demo!
Semantic End of Turn comes already included in Flow to provide the best experience for your users. You can also check out our Semantic End-of-Turn detection "how to" guide for more details on how to implement this in your own application.
Code Examples
- CLI
- Python - Live Streaming
- Python - File
Copy in your API key and file name to get started.
pip3 install speechmatics-python
speechmatics config set --auth-token $API_KEY
speechmatics rt transcribe example.wav \
--operating-point enhanced \
--enable-partials \
--max-delay 1 \
--end-of-utterance-silence-trigger 0.75
Real-time streaming from microphone - ideal for voice AI applications.
1import speechmatics
2import pyaudio
3import threading
4import time
5import asyncio
6
7API_KEY = "YOUR_API_KEY"
8LANGUAGE = "en"
9CONNECTION_URL = f"wss://eu2.rt.speechmatics.com/v2"
10
11# Audio recording parameters
12SAMPLE_RATE = 16000
13CHUNK_SIZE = 1024
14FORMAT = pyaudio.paFloat32
15
16class AudioProcessor:
17 def __init__(self):
18 self.wave_data = bytearray()
19 self.read_offset = 0
20
21 async def read(self, chunk_size):
22 while self.read_offset + chunk_size > len(self.wave_data):
23 await asyncio.sleep(0.001)
24
25 new_offset = self.read_offset + chunk_size
26 data = self.wave_data[self.read_offset:new_offset]
27 self.read_offset = new_offset
28 return data
29
30 def write_audio(self, data):
31 self.wave_data.extend(data)
32
33class VoiceAITranscriber:
34 def __init__(self):
35 self.ws = speechmatics.client.WebsocketClient(
36 speechmatics.models.ConnectionSettings(
37 url=CONNECTION_URL,
38 auth_token=API_KEY,
39 )
40 )
41 self.audio = pyaudio.PyAudio()
42 self.stream = None
43 self.is_recording = False
44 self.audio_processor = AudioProcessor()
45
46 # Set up event handlers
47 self.ws.add_event_handler(
48 event_name=speechmatics.models.ServerMessageType.AddPartialTranscript,
49 event_handler=self.handle_partial_transcript,
50 )
51
52 self.ws.add_event_handler(
53 event_name=speechmatics.models.ServerMessageType.AddTranscript,
54 event_handler=self.handle_final_transcript,
55 )
56
57 self.ws.add_event_handler(
58 event_name=speechmatics.models.ServerMessageType.EndOfUtterance,
59 event_handler=self.handle_end_of_utterance,
60 )
61
62 def handle_partial_transcript(self, msg):
63 transcript = msg['metadata']['transcript']
64 print(f"[Listening...] {transcript}")
65
66 def handle_final_transcript(self, msg):
67 transcript = msg['metadata']['transcript']
68 print(f"[Complete] {transcript}")
69
70 def handle_end_of_utterance(self, msg):
71 print("🔚 End of utterance detected - ready for AI response!")
72 # This is where your voice AI would process the complete utterance
73 # and generate a response
74
75 def stream_callback(self, in_data, frame_count, time_info, status):
76 self.audio_processor.write_audio(in_data)
77 return in_data, pyaudio.paContinue
78
79 def start_streaming(self):
80 try:
81 # Set up pyaudio stream with callback
82 self.stream = self.audio.open(
83 format=FORMAT,
84 channels=1,
85 rate=SAMPLE_RATE,
86 input=True,
87 frames_per_buffer=CHUNK_SIZE,
88 stream_callback=self.stream_callback,
89 )
90
91 # Configure audio settings
92 settings = speechmatics.models.AudioSettings()
93 settings.encoding = "pcm_f32le"
94 settings.sample_rate = SAMPLE_RATE
95 settings.chunk_size = CHUNK_SIZE
96
97 # Configure transcription with end-of-utterance detection
98
99 conversation_config = speechmatics.models.ConversationConfig(end_of_utterance_silence_trigger=0.75) # Adjust as needed
100
101 conf = speechmatics.models.TranscriptionConfig(
102 operating_point="enhanced",
103 language=LANGUAGE,
104 enable_partials=True,
105 max_delay=1,
106 conversation_config=conversation_config,
107 )
108
109 print("🎤 Voice AI ready - start speaking!")
110 print("Press Ctrl+C to stop...")
111
112 # Start transcription using the working approach
113 self.ws.run_synchronously(
114 transcription_config=conf,
115 stream=self.audio_processor,
116 audio_settings=settings,
117 )
118
119 except KeyboardInterrupt:
120 print("\n🛑 Stopping voice AI transcriber...")
121 except Exception as e:
122 print(f"Error in transcription: {e}")
123 finally:
124 self.stop_streaming()
125
126 def stop_streaming(self):
127 self.is_recording = False
128 if self.stream:
129 self.stream.stop_stream()
130 self.stream.close()
131 self.audio.terminate()
132
133# Usage
134if __name__ == "__main__":
135 transcriber = VoiceAITranscriber()
136 transcriber.start_streaming()
137
Copy in your API key and file name to get started.
1import speechmatics
2
3API_KEY = "YOUR_API_KEY"
4PATH_TO_FILE = "example.wav"
5LANGUAGE = "en"
6CONNECTION_URL = f"wss://eu2.rt.speechmatics.com/v2"
7
8# Create a transcription client
9ws = speechmatics.client.WebsocketClient(
10 speechmatics.models.ConnectionSettings(
11 url=CONNECTION_URL,
12 auth_token=API_KEY,
13 )
14)
15
16# Define an event handler to print the partial transcript
17def print_partial_transcript(msg):
18 print(f"[partial] {msg['metadata']['transcript']}")
19
20# Define an event handler to print the full transcript
21def print_transcript(msg):
22 print(f"[ FULL] {msg['metadata']['transcript']}")
23
24# Define an event handler for the end-of-utterance event
25def print_eou(msg):
26 print("EndOfUtterance")
27
28# Register the event handler for partial transcript
29ws.add_event_handler(
30 event_name=speechmatics.models.ServerMessageType.AddPartialTranscript,
31 event_handler=print_partial_transcript,
32)
33
34# Register the event handler for full transcript
35ws.add_event_handler(
36 event_name=speechmatics.models.ServerMessageType.AddTranscript,
37 event_handler=print_transcript,
38)
39
40# Register the event handler for end of utterance
41ws.add_event_handler(
42 event_name=speechmatics.models.ServerMessageType.EndOfUtterance,
43 event_handler=print_eou,
44)
45
46settings = speechmatics.models.AudioSettings()
47
48# Define transcription parameters
49# Full list of parameters described here: https://speechmatics.github.io/speechmatics-python/models
50
51conversation_config = speechmatics.models.ConversationConfig(end_of_utterance_silence_trigger=0.75) # Adjust as needed
52
53conf = speechmatics.models.TranscriptionConfig(
54 operating_point="enhanced",
55 language=LANGUAGE,
56 enable_partials=True,
57 max_delay=1,
58 conversation_config=conversation_config,
59)
60
61print("Starting transcription (type Ctrl-C to stop):")
62with open(PATH_TO_FILE, 'rb') as fd:
63 try:
64 ws.run_synchronously(fd, conf, settings)
65 except KeyboardInterrupt:
66 print("\nTranscription stopped.")
67