Realtime speaker identification
Learn how to use the Speechmatics API to identify speakers in real-timeFor an overview of the feature, see the speaker identification page.
Enrollment
To generate identifiers for a desired speaker, run a speaker diarization enabled transcription on an audio sample where the speaker is ideally speaking alone.
You can request the identifiers back from the engine by sending a GetSpeakers
request.
By default, the engine returns identifiers created up to the time of the request, but you can also wait until the end of the stream by setting the optional final
flag in the GetSpeakers
request (recommended for enrollment):
{
"message": "GetSpeakers",
"final": true
}
- final: false (default) — returns identifiers generated up to the point of the request. To avoid empty results, wait until the server has issued at least one
AddTranscript
message before sending the request. - final: true — waits until the end of the stream and returns identifiers based on all audio.
When the request is processed, the server replies with a SpeakersResult
message that contains the identifiers for each diarized speaker:
{
"message": "SpeakersResult",
"speakers": [
{"label": "S1", "speaker_identifiers": ["<id1>"]},
{"label": "S2", "speaker_identifiers": ["<id2>"]}
]
}
Identification
Once you have generated speaker identifiers, you can provide them in your next transcription job to identify and tag known speakers. This is done through the new speakers
option in the speaker diarization configuration. All other speaker diarization options remain supported. Among them, the max_speakers
parameter continues to apply only to generic (non-enrolled) speakers — for example, if it’s set to 10 and 10 speakers are enrolled, the system can still add up to 10 additional generic speakers. The speakers_sensitivity
parameter can also be used to adjust how strongly the system prefers enrolled speakers over detecting new generic ones, where lower values make it more likely to match existing enrolled speakers.
An example configuration is shown below:
{
"type": "transcription",
"transcription_config": {
"language": "en",
"diarization": "speaker",
"speaker_diarization_config": {
"speakers": [
{"label": "Alice", "speaker_identifiers": ["<alice_id1>", "<alice_id2>"]},
{"label": "Bob", "speaker_identifiers": ["<bob_id1>"]}
]
}
}
}
With the config above, transcript segments should be tagged with "Alice"
and "Bob"
whenever these speakers are detected, whereas any other speakers should be tagged with the internal labels:
{
"results": [
{
"alternatives": [
{
"confidence": 1.0,
"content": "Hello",
"language": "en",
"speaker": "Alice"
}
]
},
{
"alternatives": [
{
"confidence": 1.0,
"content": "Hi",
"language": "en",
"speaker": "S1"
}
]
},
{
"alternatives": [
{
"confidence": 1.0,
"content": "Nice",
"language": "en",
"speaker": "Bob"
}
]
}
]
}
Code examples
Real-time speakers enrollment example.
import asyncio
import speechmatics
import speechmatics.models
import speechmatics.client
from speechmatics.client import (
ServerMessageType,
ClientMessageType
)
API_KEY = "YOUR_API_KEY"
PATH_TO_FILE = "example.wav"
LANGUAGE = "en"
CONNECTION_URL = "wss://eu2.rt.speechmatics.com/v2"
async def enroll_speakers():
handler_tasks: list[asyncio.Task] = []
transcription_config = speechmatics.models.TranscriptionConfig(**{
"language": LANGUAGE,
"diarization": "speaker",
}
)
# Create a transcription client
client = speechmatics.client.WebsocketClient(
speechmatics.models.ConnectionSettings(
url=CONNECTION_URL,
auth_token=API_KEY,
)
)
# Register the event handler for RecognitionStarted
# to send the GetSpeakers(final=True) request
client.add_event_handler(
ServerMessageType.RecognitionStarted,
lambda _: handler_tasks.append(
asyncio.create_task(client.send_message(ClientMessageType.GetSpeakers, {"final": True}))
),
)
# Register the event handler for SpeakersResult
# to print the speaker identifiers obtained from the server
client.add_event_handler(
ServerMessageType.SpeakersResult,
lambda message: print(f"[speaker identifiers] {message['speakers']}"),
)
with open(PATH_TO_FILE, "rb") as fh:
await asyncio.create_task(client.run(fh, transcription_config))
for task in handler_tasks:
await task
if __name__ == "__main__":
asyncio.run(enroll_speakers())
Real-time speakers identification example.
import asyncio
import speechmatics
import speechmatics.models
import speechmatics.client
from speechmatics.client import ServerMessageType
API_KEY = "YOUR_API_KEY"
PATH_TO_FILE = "example.wav"
LANGUAGE = "en"
CONNECTION_URL = "wss://eu2.rt.speechmatics.com/v2"
async def identify_speakers():
handler_tasks: list[asyncio.Task] = []
transcription_config = speechmatics.models.TranscriptionConfig(**{
"language": LANGUAGE,
"diarization": "speaker",
"speaker_diarization_config": {
# add your speaker identifiers
"speakers": [
{"label": "Alice", "speaker_identifiers": ["<alice_id1>", "<alice_id2>"]},
{"label": "Bob", "speaker_identifiers": ["<bob_id1>"]},
]
}
}
)
# Create a transcription client
client = speechmatics.client.WebsocketClient(
speechmatics.models.ConnectionSettings(
url=CONNECTION_URL,
auth_token=API_KEY,
)
)
# Optionally, add transcript handler
client.add_event_handler(
ServerMessageType.AddTranscript,
lambda message: print(f"[transcript] {message['results']}"),
)
with open(PATH_TO_FILE, "rb") as fh:
await asyncio.create_task(client.run(fh, transcription_config))
for task in handler_tasks:
await task
if __name__ == "__main__":
asyncio.run(identify_speakers())