Skip to main content
Speech to TextRealtime Transcription

Realtime speaker identification

Learn how to use the Speechmatics API to identify speakers in real-time

For an overview of the feature, see the speaker identification page.

Enrollment

To generate identifiers for a desired speaker, run a speaker diarization enabled transcription on an audio sample where the speaker is ideally speaking alone. You can request the identifiers back from the engine by sending a GetSpeakers request.

By default, the engine returns identifiers created up to the time of the request, but you can also wait until the end of the stream by setting the optional final flag in the GetSpeakers request (recommended for enrollment):

{
"message": "GetSpeakers",
"final": true
}
  • final: false (default) — returns identifiers generated up to the point of the request. To avoid empty results, wait until the server has issued at least one AddTranscript message before sending the request.
  • final: true — waits until the end of the stream and returns identifiers based on all audio.

When the request is processed, the server replies with a SpeakersResult message that contains the identifiers for each diarized speaker:

{
"message": "SpeakersResult",
"speakers": [
{"label": "S1", "speaker_identifiers": ["<id1>"]},
{"label": "S2", "speaker_identifiers": ["<id2>"]}
]
}

Identification

Once you have generated speaker identifiers, you can provide them in your next transcription job to identify and tag known speakers. This is done through the new speakers option in the speaker diarization configuration. All other speaker diarization options remain supported. Among them, the max_speakers parameter continues to apply only to generic (non-enrolled) speakers — for example, if it’s set to 10 and 10 speakers are enrolled, the system can still add up to 10 additional generic speakers. The speakers_sensitivity parameter can also be used to adjust how strongly the system prefers enrolled speakers over detecting new generic ones, where lower values make it more likely to match existing enrolled speakers.

An example configuration is shown below:

{
"type": "transcription",
"transcription_config": {
"language": "en",
"diarization": "speaker",
"speaker_diarization_config": {
"speakers": [
{"label": "Alice", "speaker_identifiers": ["<alice_id1>", "<alice_id2>"]},
{"label": "Bob", "speaker_identifiers": ["<bob_id1>"]}
]
}
}
}

With the config above, transcript segments should be tagged with "Alice" and "Bob" whenever these speakers are detected, whereas any other speakers should be tagged with the internal labels:

{
"results": [
{
"alternatives": [
{
"confidence": 1.0,
"content": "Hello",
"language": "en",
"speaker": "Alice"
}
]
},
{
"alternatives": [
{
"confidence": 1.0,
"content": "Hi",
"language": "en",
"speaker": "S1"
}
]
},
{
"alternatives": [
{
"confidence": 1.0,
"content": "Nice",
"language": "en",
"speaker": "Bob"
}
]
}
]
}

Code examples

Real-time speakers enrollment example.

import asyncio
import speechmatics
import speechmatics.models
import speechmatics.client
from speechmatics.client import (
ServerMessageType,
ClientMessageType
)

API_KEY = "YOUR_API_KEY"
PATH_TO_FILE = "example.wav"
LANGUAGE = "en"
CONNECTION_URL = "wss://eu2.rt.speechmatics.com/v2"

async def enroll_speakers():
handler_tasks: list[asyncio.Task] = []

transcription_config = speechmatics.models.TranscriptionConfig(**{
"language": LANGUAGE,
"diarization": "speaker",
}
)

# Create a transcription client
client = speechmatics.client.WebsocketClient(
speechmatics.models.ConnectionSettings(
url=CONNECTION_URL,
auth_token=API_KEY,
)
)
# Register the event handler for RecognitionStarted
# to send the GetSpeakers(final=True) request
client.add_event_handler(
ServerMessageType.RecognitionStarted,
lambda _: handler_tasks.append(
asyncio.create_task(client.send_message(ClientMessageType.GetSpeakers, {"final": True}))
),
)
# Register the event handler for SpeakersResult
# to print the speaker identifiers obtained from the server
client.add_event_handler(
ServerMessageType.SpeakersResult,
lambda message: print(f"[speaker identifiers] {message['speakers']}"),
)

with open(PATH_TO_FILE, "rb") as fh:
await asyncio.create_task(client.run(fh, transcription_config))

for task in handler_tasks:
await task

if __name__ == "__main__":
asyncio.run(enroll_speakers())