Skip to main content
Voice agents

Voice agent overview

Learn how to build voice-enabled applications with the Speechmatics voice SDK

The voice SDK builds on our real-time API to provide features optimized for conversational AI:

  • Intelligent segmentation: groups words into meaningful speech segments per speaker.
  • Turn detection: automatically detects when speakers finish talking.
  • Speaker management: focus on or ignore specific speakers in multi-speaker scenarios.
  • Preset configurations: offers ready-to-use settings for conversations, note-taking, and captions.
  • Simplified event handling: delivers clean, structured segments instead of raw word-level events.

When to use the voice SDK vs real-time SDK

Use the voice SDK when:

  • Building conversational AI or voice agents
  • You need automatic turn detection
  • You want speaker-focused transcription
  • You need ready-to-use presets for common scenarios

Use the realtime SDK when:

  • You need the raw stream of word-by-word transcription data
  • Building custom segmentation logic
  • You want fine-grained control over every event
  • Processing batch files or custom workflows

Getting started

1. Get your API key

Create an API key in the portal to access the voice SDK. Store your key securely as a managed secret.

2. Install dependencies

# Standard installation
pip install speechmatics-voice

# With SMART_TURN (ML-based turn detection)
pip install speechmatics-voice[smart]

3. Quickstart

Replace YOUR_API_KEY with your actual API key from the portal:

import asyncio
import os
from speechmatics.rt import Microphone
from speechmatics.voice import VoiceAgentClient, AgentServerMessageType

async def main():
# Create client with preset
client = VoiceAgentClient(
api_key=os.getenv("SPEECHMATICS_API_KEY"),
preset="scribe"
)

# Handle final segments
@client.on(AgentServerMessageType.ADD_SEGMENT)
def on_segment(message):
for segment in message["segments"]:
speaker = segment["speaker_id"]
text = segment["text"]
print(f"{speaker}: {text}")

# Setup microphone
mic = Microphone(sample_rate=16000, chunk_size=320)
if not mic.start():
print("Error: Microphone not available")
return

# Connect and stream
await client.connect()

try:
while True:
audio_chunk = await mic.read(320)
await client.send_audio(audio_chunk)
except KeyboardInterrupt:
pass
finally:
await client.disconnect()

if __name__ == "__main__":
asyncio.run(main())

FAQ

Implementation and deployment

Can I deploy this in my own environment?

Yes! The voice agent SDK can be consumed via our managed service or deployed in your own environment. To learn more about on-premises deployment options, speak to sales.

Support

Where can I provide feedback or get help?

You can submit feedback, bug reports, or feature requests through the Speechmatics GitHub discussions.

Next steps

For more information, see the voice agent Python SDK on github.

To learn more, check out the Speechmatics academy

Ready to build something amazing with our voice agent SDK? We'd love to hear about your project and help you succeed.

Get in touch with us:

  • Share your feedback and feature requests
  • Ask questions about implementation
  • Discuss enterprise pricing and custom voices
  • Report any issues or bugs you encounter

Contact our team or join our developer community (https://www.reddit.com/r/Speechmatics) to connect with other builders using text to speech.