Voice agents

Voice agent overview

Learn how to build voice-enabled applications with the Speechmatics voice SDK

The voice SDK builds on our real-time API to provide features optimized for conversational AI:

Intelligent segmentation: groups words into meaningful speech segments per speaker.
Turn detection: automatically detects when speakers finish talking.
Speaker management: focus on or ignore specific speakers in multi-speaker scenarios.
Preset configurations: offers ready-to-use settings for conversations, note-taking, and captions.
Simplified event handling: delivers clean, structured segments instead of raw word-level events.

When to use the voice SDK vs real-time SDK

Use the voice SDK when:

Building conversational AI or voice agents
You need automatic turn detection
You want speaker-focused transcription
You need ready-to-use presets for common scenarios

Use the realtime SDK when:

You need the raw stream of word-by-word transcription data
Building custom segmentation logic
You want fine-grained control over every event
Processing batch files or custom workflows

Getting started

1. Get your API key

Create an API key in the portal to access the voice SDK. Store your key securely as a managed secret.

2. Install dependencies

# Standard installation
pip install speechmatics-voice

# With SMART_TURN (ML-based turn detection)
pip install speechmatics-voice[smart]

3. Quickstart

Replace YOUR_API_KEY with your actual API key from the portal:

import asyncio
import os
from speechmatics.rt import Microphone
from speechmatics.voice import VoiceAgentClient, AgentServerMessageType

async def main():
    # Create client with preset
    client = VoiceAgentClient(
        api_key=os.getenv("SPEECHMATICS_API_KEY"),
        preset="scribe"
    )

    # Handle final segments
    @client.on(AgentServerMessageType.ADD_SEGMENT)
    def on_segment(message):
        for segment in message["segments"]:
            speaker = segment["speaker_id"]
            text = segment["text"]
            print(f"{speaker}: {text}")

    # Setup microphone
    mic = Microphone(sample_rate=16000, chunk_size=320)
    if not mic.start():
        print("Error: Microphone not available")
        return

    # Connect and stream
    await client.connect()

    try:
        while True:
            audio_chunk = await mic.read(320)
            await client.send_audio(audio_chunk)
    except KeyboardInterrupt:
        pass
    finally:
        await client.disconnect()

if __name__ == "__main__":
    asyncio.run(main())

# Presets provide optimized configurations for common use cases: 

# External end of turn preset - endpointing handled by the client
client = VoiceAgentClient(api_key=api_key, preset="external")

# Scribe preset - for note-taking
client = VoiceAgentClient(api_key=api_key, preset="scribe")

# Low latency preset - for fast responses
client = VoiceAgentClient(api_key=api_key, preset="low_latency")

# Conversation preset - for natural dialogue
client = VoiceAgentClient(api_key=api_key, preset="conversation_adaptive")

# Advanced conversation with ML turn detection
client = VoiceAgentClient(api_key=api_key, preset="conversation_smart_turn")

# Captions preset - for live captioning
client = VoiceAgentClient(api_key=api_key, preset="captions")

from speechmatics.voice import VoiceAgentClient, VoiceAgentConfig, EndOfUtteranceMode

config = VoiceAgentConfig(
    language="en",
    enable_diarization=True,
    max_delay=0.7,
    end_of_utterance_mode=EndOfUtteranceMode.ADAPTIVE,
)

client = VoiceAgentClient(api_key=api_key, config=config)

FAQ

Implementation and deployment

Can I deploy this in my own environment?

Yes! The voice agent SDK can be consumed via our managed service or deployed in your own environment. To learn more about on-premises deployment options, speak to sales.

Support

Where can I provide feedback or get help?

You can submit feedback, bug reports, or feature requests through the Speechmatics GitHub discussions.

Next steps

For more information, see the voice agent Python SDK on github.

To learn more, check out the Speechmatics academy

Ready to build something amazing with our voice agent SDK? We'd love to hear about your project and help you succeed.

Get in touch with us:

Share your feedback and feature requests
Ask questions about implementation
Discuss enterprise pricing and custom voices
Report any issues or bugs you encounter

Contact our team or join our developer community (https://www.reddit.com/r/Speechmatics) to connect with other builders using text to speech.

When to use the voice SDK vs real-time SDK​

Getting started​

1. Get your API key​

2. Install dependencies​

3. Quickstart​

FAQ​

Implementation and deployment​

Support​

Next steps​