Voice agent overview
Learn how to build voice-enabled applications with the Speechmatics voice SDKThe voice SDK builds on our real-time API to provide features optimized for conversational AI:
- Intelligent segmentation: groups words into meaningful speech segments per speaker.
- Turn detection: automatically detects when speakers finish talking.
- Speaker management: focus on or ignore specific speakers in multi-speaker scenarios.
- Preset configurations: offers ready-to-use settings for conversations, note-taking, and captions.
- Simplified event handling: delivers clean, structured segments instead of raw word-level events.
When to use the voice SDK vs real-time SDK
Use the voice SDK when:
- Building conversational AI or voice agents
- You need automatic turn detection
- You want speaker-focused transcription
- You need ready-to-use presets for common scenarios
Use the realtime SDK when:
- You need the raw stream of word-by-word transcription data
- Building custom segmentation logic
- You want fine-grained control over every event
- Processing batch files or custom workflows
Getting started
1. Get your API key
Create an API key in the portal to access the voice SDK. Store your key securely as a managed secret.
2. Install dependencies
# Standard installation
pip install speechmatics-voice
# With SMART_TURN (ML-based turn detection)
pip install speechmatics-voice[smart]
3. Quickstart
Replace YOUR_API_KEY with your actual API key from the portal:
import asyncio
import os
from speechmatics.rt import Microphone
from speechmatics.voice import VoiceAgentClient, AgentServerMessageType
async def main():
# Create client with preset
client = VoiceAgentClient(
api_key=os.getenv("SPEECHMATICS_API_KEY"),
preset="scribe"
)
# Handle final segments
@client.on(AgentServerMessageType.ADD_SEGMENT)
def on_segment(message):
for segment in message["segments"]:
speaker = segment["speaker_id"]
text = segment["text"]
print(f"{speaker}: {text}")
# Setup microphone
mic = Microphone(sample_rate=16000, chunk_size=320)
if not mic.start():
print("Error: Microphone not available")
return
# Connect and stream
await client.connect()
try:
while True:
audio_chunk = await mic.read(320)
await client.send_audio(audio_chunk)
except KeyboardInterrupt:
pass
finally:
await client.disconnect()
if __name__ == "__main__":
asyncio.run(main())
# Presets provide optimized configurations for common use cases:
# External end of turn preset - endpointing handled by the client
client = VoiceAgentClient(api_key=api_key, preset="external")
# Scribe preset - for note-taking
client = VoiceAgentClient(api_key=api_key, preset="scribe")
# Low latency preset - for fast responses
client = VoiceAgentClient(api_key=api_key, preset="low_latency")
# Conversation preset - for natural dialogue
client = VoiceAgentClient(api_key=api_key, preset="conversation_adaptive")
# Advanced conversation with ML turn detection
client = VoiceAgentClient(api_key=api_key, preset="conversation_smart_turn")
# Captions preset - for live captioning
client = VoiceAgentClient(api_key=api_key, preset="captions")
from speechmatics.voice import VoiceAgentClient, VoiceAgentConfig, EndOfUtteranceMode
config = VoiceAgentConfig(
language="en",
enable_diarization=True,
max_delay=0.7,
end_of_utterance_mode=EndOfUtteranceMode.ADAPTIVE,
)
client = VoiceAgentClient(api_key=api_key, config=config)
FAQ
Implementation and deployment
Can I deploy this in my own environment?
Yes! The voice agent SDK can be consumed via our managed service or deployed in your own environment. To learn more about on-premises deployment options, speak to sales.
Support
Where can I provide feedback or get help?
You can submit feedback, bug reports, or feature requests through the Speechmatics GitHub discussions.
Next steps
For more information, see the voice agent Python SDK on github.
To learn more, check out the Speechmatics academy
Ready to build something amazing with our voice agent SDK? We'd love to hear about your project and help you succeed.
Get in touch with us:
- Share your feedback and feature requests
- Ask questions about implementation
- Discuss enterprise pricing and custom voices
- Report any issues or bugs you encounter
Contact our team or join our developer community (https://www.reddit.com/r/Speechmatics) to connect with other builders using text to speech.