Skip to main content
Voice agents

Configuration Parameters

Learn about configuration parameters for the voice SDK

Basic Parameters

language (str, default: "en") Language code for transcription (e.g., "en", "es", "fr"). See supported languages.

operating_point (OperatingPoint, default: ENHANCED) Balance accuracy vs latency. Options: STANDARD or ENHANCED.

domain (str, default: None) Domain-specific model (e.g., "finance", "medical"). See supported languages and domains.

output_locale (str, default: None) Output locale for formatting (e.g., "en-GB", "en-US"). See supported languages and locales.

enable_diarization (bool, default: False) Enable speaker diarization to identify and label different speakers.

Turn Detection Parameters

end_of_utterance_mode (EndOfUtteranceMode, default: FIXED) Controls how turn endings are detected:

FIXED - Uses fixed silence threshold. Fast but may split slow speech. ADAPTIVE - Adjusts delay based on speech rate, pauses, and disfluencies. Best for natural conversation. SMART_TURN - Uses ML model to detect acoustic turn-taking cues. Requires [smart] extras. EXTERNAL - Manual control via client.finalize(). For custom turn logic. end_of_utterance_silence_trigger (float, default: 0.2) Silence duration in seconds to trigger turn end.

end_of_utterance_max_delay (float, default: 10.0) Maximum delay before forcing turn end.

max_delay (float, default: 0.7) Maximum transcription delay for word emission.

Speaker Configuration

speaker_sensitivity (float, default: 0.5) Diarization sensitivity between 0.0 and 1.0. Higher values detect more speakers.

max_speakers (int, default: None) Limit maximum number of speakers to detect.

prefer_current_speaker (bool, default: False) Give extra weight to current speaker for word grouping.

speaker_config (SpeakerFocusConfig, default: SpeakerFocusConfig()) Configure speaker focus/ignore rules.

from speechmatics.voice import SpeakerFocusConfig, SpeakerFocusMode

# Focus only on specific speakers
config = VoiceAgentConfig(
enable_diarization=True,
speaker_config=SpeakerFocusConfig(
focus_speakers=["S1", "S2"],
focus_mode=SpeakerFocusMode.RETAIN
)
)

# Ignore specific speakers
config = VoiceAgentConfig(
enable_diarization=True,
speaker_config=SpeakerFocusConfig(
ignore_speakers=["S3"],
focus_mode=SpeakerFocusMode.IGNORE
)
)

known_speakers (list[SpeakerIdentifier], default: []) Pre-enrolled speaker identifiers for speaker identification.

from speechmatics.voice import SpeakerIdentifier

config = VoiceAgentConfig(
enable_diarization=True,
known_speakers=[
SpeakerIdentifier(label="Alice", speaker_identifiers=["XX...XX"]),
SpeakerIdentifier(label="Bob", speaker_identifiers=["YY...YY"])
]
)

Language & Vocabulary

additional_vocab (list[AdditionalVocabEntry], default: []) Custom vocabulary for domain-specific terms.

from speechmatics.voice import AdditionalVocabEntry

config = VoiceAgentConfig(
language="en",
additional_vocab=[
AdditionalVocabEntry(
content="Speechmatics",
sounds_like=["speech matters", "speech matics"]
),
AdditionalVocabEntry(content="API"),
]
)

punctuation_overrides (dict, default: None) Custom punctuation rules.

Audio Parameters

sample_rate (int, default: 16000) Audio sample rate in Hz.

audio_encoding (AudioEncoding, default: PCM_S16LE) Audio encoding format.

Advanced Parameters

transcription_update_preset (TranscriptionUpdatePreset, default: COMPLETE) Controls when to emit updates: COMPLETE, COMPLETE_PLUS_TIMING, WORDS, WORDS_PLUS_TIMING, or TIMING.

speech_segment_config (SpeechSegmentConfig, default: SpeechSegmentConfig()) Fine-tune segment generation and post-processing.

smart_turn_config (SmartTurnConfig, default: None) Configure SMART_TURN behavior (buffer length, threshold).

include_results (bool, default: False) Include word-level timing data in segments.

include_partials (bool, default: True) Emit partial segments. Set to False for final-only output.

Configuration with Overlays

Use presets as a starting point and customize with overlays:

from speechmatics.voice import VoiceAgentConfigPreset, VoiceAgentConfig

# Use preset with custom overrides
config = VoiceAgentConfigPreset.SCRIBE(
VoiceAgentConfig(
language="es",
max_delay=0.8
)
)

Available presets

presets = VoiceAgentConfigPreset.list_presets()
# Output: ['low_latency', 'conversation_adaptive', 'conversation_smart_turn', 'scribe', 'captions']

Configuration Serialization

Export and import configurations as JSON:

from speechmatics.voice import VoiceAgentConfigPreset, VoiceAgentConfig

# Export preset to JSON
config_json = VoiceAgentConfigPreset.SCRIBE().to_json()

# Load from JSON
config = VoiceAgentConfig.from_json(config_json)

# Or create from JSON string
config = VoiceAgentConfig.from_json('{"language": "en", "enable_diarization": true}')

For more information, see the voice agent Python SDK on github.