Integrations and SDKsPipecat

Pipecat speech to text

Transcribe live audio in your Pipecat voice bots with Speechmatics STT.

Use the Speechmatics STT service to transcribe live audio in your Pipecat voice bots.

Features

Real-time transcription — Low-latency streaming with partial (interim) results
Turn detection — Adaptive, fixed, ML-based, or external control modes
Speaker diarization — Identify and attribute speech to different speakers
Speaker filtering — Focus on specific speakers or ignore others (like the assistant)
Custom vocabulary — Boost recognition for domain-specific terms and proper nouns
Output formatting — Configurable templates for multi-speaker transcripts

Installation

pip install "pipecat-ai[speechmatics]"

Basic configuration

Authentication

By default, the service reads your API key from the SPEECHMATICS_API_KEY environment variable.

Service options

Parameter	Type	Default	Description
`api_key`	string	env var	Speechmatics API key (defaults to `SPEECHMATICS_API_KEY`)
`base_url`	string	env var	Realtime base URL (defaults to `SPEECHMATICS_RT_URL`, or `wss://eu2.rt.speechmatics.com/v2`)
`sample_rate`	number	pipeline default	Audio sample rate in Hz
`should_interrupt`	boolean	`true`	Enable interruption on detected speech

Input parameters

These are passed via params=SpeechmaticsSTTService.InputParams(...):

Parameter	Type	Default	Description
`language`	Language \| string	`Language.EN`	Language code for transcription
`domain`	string \| null	`null`	Domain-specific model (for example `"finance"`)
`operating_point`	OperatingPoint \| null	`null`	Transcription accuracy. Use `OperatingPoint.ENHANCED` (higher accuracy) or `OperatingPoint.STANDARD` (lower latency)
`audio_encoding`	AudioEncoding	`PCM_S16LE`	Audio encoding format: `AudioEncoding.PCM_S16LE`, `AudioEncoding.PCM_F32LE`, or `AudioEncoding.MULAW`
`punctuation_overrides`	object \| null	`null`	Custom punctuation rules
`extra_params`	object \| null	`null`	Additional parameters to pass to the API

Example

from pipecat.services.speechmatics.stt import SpeechmaticsSTTService

stt = SpeechmaticsSTTService(
    params=SpeechmaticsSTTService.InputParams(
        language="en",
        operating_point=SpeechmaticsSTTService.OperatingPoint.ENHANCED,
    ),
)

Advanced configuration

Turn detection

Turn detection determines when a user has finished their complete thought, while the Realtime API's EndOfUtterance message indicates a pause in speech. The service handles this distinction automatically.

Modes

Set turn_detection_mode to control how end of speech is detected:

Mode	When to use
`TurnDetectionMode.EXTERNAL`	Default and recommended. Delegates turn detection to Pipecat's pipeline (VAD, Smart Turn, etc.). Try this first
`TurnDetectionMode.ADAPTIVE`	Speechmatics analyzes speech content and acoustic patterns for end-of-turn detection
`TurnDetectionMode.FIXED`	Fixed silence threshold using `end_of_utterance_silence_trigger`
`TurnDetectionMode.SMART_TURN`	Speechmatics Smart Turn for ML-based turn detection

Start with EXTERNAL mode. This lets you use Pipecat's turn detection features (like LocalSmartTurnAnalyzerV3) which are well-integrated with the pipeline. Only switch to other modes if you need Speechmatics to handle turn detection directly.

from pipecat.services.speechmatics.stt import SpeechmaticsSTTService

# External mode (default, recommended) - use Pipecat's turn detection
stt = SpeechmaticsSTTService(
    params=SpeechmaticsSTTService.InputParams(
        turn_detection_mode=SpeechmaticsSTTService.TurnDetectionMode.EXTERNAL,
    ),
)

# Adaptive mode - Speechmatics determines end-of-turn
stt = SpeechmaticsSTTService(
    params=SpeechmaticsSTTService.InputParams(
        turn_detection_mode=SpeechmaticsSTTService.TurnDetectionMode.ADAPTIVE,
    ),
)

# Fixed mode - consistent silence threshold
stt = SpeechmaticsSTTService(
    params=SpeechmaticsSTTService.InputParams(
        turn_detection_mode=SpeechmaticsSTTService.TurnDetectionMode.FIXED,
        end_of_utterance_silence_trigger=0.8,  # 800ms of silence
        end_of_utterance_max_delay=5.0,        # Force end after 5s
    ),
)

# Smart turn mode - Speechmatics ML-based turn detection
stt = SpeechmaticsSTTService(
    params=SpeechmaticsSTTService.InputParams(
        turn_detection_mode=SpeechmaticsSTTService.TurnDetectionMode.SMART_TURN,
    ),
)

When using ADAPTIVE or SMART_TURN modes, remove any competing VAD or turn-detection features from your pipeline to avoid conflicts.

Configuration

Parameter	Type	Default	Description
`end_of_utterance_silence_trigger`	number \| null	`null`	Silence duration (seconds) that triggers end of utterance. Used primarily in `FIXED` mode. Valid range: >0 to <2 seconds (exclusive)
`end_of_utterance_max_delay`	number \| null	`null`	Maximum delay (seconds) before forcing an end of utterance. Must be greater than `end_of_utterance_silence_trigger`
`max_delay`	number \| null	`null`	Maximum transcription delay (seconds). Lower values reduce latency at the cost of accuracy. Valid range: 0.7–4.0 seconds
`include_partials`	boolean \| null	`null`	Enable partial (interim) transcription results
`split_sentences`	boolean \| null	`null`	Split transcription into sentences

Advanced diarization

The service can attribute words to speakers and lets you decide which speakers are treated as active (foreground) vs passive (background).

Configuration

Parameter	Type	Default	Description
`enable_diarization`	boolean \| null	`null`	Enable speaker diarization
`speaker_sensitivity`	number \| null	`null`	Speaker detection sensitivity. Valid range: >0.0 to <1.0 (exclusive)
`max_speakers`	number \| null	`null`	Maximum number of speakers to detect. Valid range: 2–100
`prefer_current_speaker`	boolean \| null	`null`	Reduce speaker switching for similar voices
`known_speakers`	array \| null	`null`	Pre-define speaker identifiers with labels (`SpeakerIdentifier` objects)
`additional_vocab`	array \| null	`null`	Custom vocabulary entries (`AdditionalVocabEntry` objects) for improved recognition

from pipecat.services.speechmatics.stt import SpeechmaticsSTTService

stt = SpeechmaticsSTTService(
    params=SpeechmaticsSTTService.InputParams(
        enable_diarization=True,
        speaker_sensitivity=0.7,
        max_speakers=3,
        prefer_current_speaker=True,
        additional_vocab=[
            SpeechmaticsSTTService.AdditionalVocabEntry(content="Speechmatics"),
            SpeechmaticsSTTService.AdditionalVocabEntry(content="API", sounds_like=["A P I"]),
        ],
    ),
)

Known speakers

Use known_speakers to attribute words to specific speakers across sessions. This is useful when you want consistent speaker identification for known participants.

from pipecat.services.speechmatics.stt import SpeechmaticsSTTService

stt = SpeechmaticsSTTService(
    params=SpeechmaticsSTTService.InputParams(
        enable_diarization=True,
        known_speakers=[
            SpeechmaticsSTTService.SpeakerIdentifier(label="Alice", speaker_identifiers=["speaker_abc123"]),
            SpeechmaticsSTTService.SpeakerIdentifier(label="Bob", speaker_identifiers=["speaker_def456"]),
        ],
    ),
)

Speaker identifiers are unique to each Speechmatics account and can be obtained from a previous transcription session.

Speaker focus

Control which speakers are treated as active (foreground) vs passive (background):

Active speakers are the speakers you care about in your application. They generate FINAL_TRANSCRIPT events.
Passive speakers are still transcribed, but their words are buffered and only included in the output alongside new words from active speakers.

Parameter	Type	Default	Description
`focus_speakers`	array	`[]`	Speaker IDs to treat as active
`ignore_speakers`	array	`[]`	Speaker IDs to exclude entirely
`focus_mode`	SpeakerFocusMode	`RETAIN`	How to handle non-focused speakers

Focus modes

SpeakerFocusMode.RETAIN keeps non-focused speakers as passive.
SpeakerFocusMode.IGNORE discards non-focused speaker words entirely.

ignore_speakers always excludes those speakers from transcription and their speech will not trigger VAD or end of utterance detection.

from pipecat.services.speechmatics.stt import SpeechmaticsSTTService

stt = SpeechmaticsSTTService(
    params=SpeechmaticsSTTService.InputParams(
        focus_speakers=["S1"],
        focus_mode=SpeechmaticsSTTService.SpeakerFocusMode.RETAIN,
        ignore_speakers=["S3"],
    ),
)

Speaker formatting

Use speaker_active_format and speaker_passive_format to format transcripts for your LLM. The templates support {speaker_id}, {text}, {ts}, {start_time}, {end_time}, and {lang}.

Parameter	Type	Default	Description
`speaker_active_format`	string \| null	`null`	Format template for active speaker output
`speaker_passive_format`	string \| null	`null`	Format template for passive speaker output

from pipecat.services.speechmatics.stt import SpeechmaticsSTTService

stt = SpeechmaticsSTTService(
    params=SpeechmaticsSTTService.InputParams(
        speaker_active_format="<{speaker_id}>{text}</{speaker_id}>",
        speaker_passive_format="<{speaker_id} background>{text}</{speaker_id}>",
    ),
)

When you use a custom format, include it in your bot's system prompt so the LLM can interpret speaker tags consistently.

Updating speakers during transcription

You can dynamically change which speakers to focus on or ignore during an active transcription session using the update_params() method.

from pipecat.services.speechmatics.stt import SpeechmaticsSTTService

stt = SpeechmaticsSTTService(
    params=SpeechmaticsSTTService.InputParams(enable_diarization=True),
)

# Later, during transcription:
stt.update_params(
    SpeechmaticsSTTService.UpdateParams(
        focus_speakers=["S1", "S2"],
        ignore_speakers=["S3"],
        focus_mode=SpeechmaticsSTTService.SpeakerFocusMode.RETAIN,
    )
)

This is useful when you need to adjust speaker filtering based on runtime conditions, such as when a new participant joins or leaves a conversation.

Example

from pipecat.services.speechmatics.stt import SpeechmaticsSTTService

stt = SpeechmaticsSTTService(
    params=SpeechmaticsSTTService.InputParams(
        # Service options
        language="en",
        operating_point=SpeechmaticsSTTService.OperatingPoint.ENHANCED,

        # Turn detection
        turn_detection_mode=SpeechmaticsSTTService.TurnDetectionMode.EXTERNAL,
        max_delay=1.5,
        include_partials=True,

        # Diarization
        enable_diarization=True,
        speaker_sensitivity=0.6,
        max_speakers=4,
        prefer_current_speaker=True,

        # Speaker focus
        focus_speakers=["S1", "S2"],
        focus_mode=SpeechmaticsSTTService.SpeakerFocusMode.RETAIN,
        ignore_speakers=[],

        # Output formatting
        speaker_active_format="[{speaker_id}]: {text}",
        speaker_passive_format="[{speaker_id} (background)]: {text}",

        # Custom vocabulary
        additional_vocab=[
            SpeechmaticsSTTService.AdditionalVocabEntry(content="Speechmatics"),
            SpeechmaticsSTTService.AdditionalVocabEntry(content="Pipecat", sounds_like=["pipe cat"]),
        ],
    ),
)

Next steps

Quickstart — Build a complete voice bot
Text to speech — Use Speechmatics voices in your bot
Pipecat documentation — Full Speechmatics STT reference

Features​

Installation​

Basic configuration​

Authentication​

Service options​

Input parameters​

Example​

Advanced configuration​

Turn detection​

Modes​

Configuration​

Advanced diarization​

Configuration​

Known speakers​

Speaker focus​

Focus modes​

Speaker formatting​

Updating speakers during transcription​

Example​

Next steps​

Features

Installation

Basic configuration

Authentication

Service options

Input parameters

Example

Advanced configuration

Turn detection

Modes

Configuration

Advanced diarization

Configuration

Known speakers

Speaker focus

Focus modes

Speaker formatting

Updating speakers during transcription

Example

Next steps