Speech to TextRealtime Transcription

Realtime speaker identification

Learn how to use the Speechmatics API to identify speakers in real-time

For an overview of the feature, see the speaker identification page.

Enrollment

To generate identifiers for a desired speaker, run a speaker diarization enabled transcription on an audio sample where the speaker is ideally speaking alone. You can request the identifiers back from the engine by sending a GetSpeakers request.

By default, the engine returns identifiers created up to the time of the request, but you can also wait until the end of the stream by setting the optional final flag in the GetSpeakers request:

{
  "message": "GetSpeakers",
  "final": true
}

final: false (default) — returns identifiers generated up to the point of the request. To avoid empty results, wait until the server has issued at least one AddTranscript message before sending the request.
final: true — waits until the end of the stream and returns identifiers based on all audio.

Alternatively, you can enable automatic speaker retrieval by setting the get_speakers option to true in the diarization configuration (recommended for enrollment). This guarantees that the engine automatically provides speaker identifiers once the transcription is complete, equivalent to manually calling GetSpeakers(final=true). If the get_speakers option is not present in the configuration or is set to false, you can still request speakers explicitly by sending the GetSpeakers(final=true) message. In this case, the request takes precedence, and the engine will return the speaker identifiers at the end of the transcription.

Example speaker diarization config:

{
  "type": "transcription",
  "transcription_config": {
    "language": "en",
    "diarization": "speaker"
    "speaker_diarization_config": {
      "get_speakers": true
    }
  }
}

When the request is processed, the server replies with a SpeakersResult message that contains the identifiers for each diarized speaker:

{
  "message": "SpeakersResult",
  "speakers": [
    {"label": "S1", "speaker_identifiers": ["<id1>"]},
    {"label": "S2", "speaker_identifiers": ["<id2>"]}
  ]
}

Identification

Once you've generated speaker identifiers, you can provide them in your next transcription job to identify and tag known speakers. This is done through the speakers option in the speaker diarization configuration.

All speaker diarization options work with speaker identification. The max_speakers parameter only applies to generic (non-enrolled) speakers. For example, if it’s set to 10 and 10 speakers are enrolled, the system can still add up to 10 additional generic speakers. The speakers_sensitivity parameter can also be used to adjust how strongly the system prefers enrolled speakers over detecting new generic ones. Lower values make it more likely to match existing enrolled speakers.

An example configuration is shown below:

{
  "type": "transcription",
  "transcription_config": {
    "language": "en",
    "diarization": "speaker",
    "speaker_diarization_config": {
      "speakers": [
        {"label": "Alice", "speaker_identifiers": ["<alice_id1>", "<alice_id2>"]},
        {"label": "Bob", "speaker_identifiers": ["<bob_id1>"]}
      ]
    }
  }
}

With the config above, transcript segments should be tagged with "Alice" and "Bob" whenever these speakers are detected, whereas any other speakers should be tagged with the internal labels:

{
  "results": [
    {
      "alternatives": [
        {
          "confidence": 1.0,
          "content": "Hello",
          "language": "en",
          "speaker": "Alice"
        }
      ]
    },
    {
      "alternatives": [
        {
          "confidence": 1.0,
          "content": "Hi",
          "language": "en",
          "speaker": "S1"
        }
      ]
    },
    {
      "alternatives": [
        {
          "confidence": 1.0,
          "content": "Nice",
          "language": "en",
          "speaker": "Bob"
        }
      ]
    }
  ]
}

Code examples

Real-time speakers enrollment example.

import asyncio
import speechmatics
import speechmatics.models
import speechmatics.client
from speechmatics.client import (
    ServerMessageType,
    ClientMessageType
)

API_KEY = "YOUR_API_KEY"
PATH_TO_FILE = "example.wav"
LANGUAGE = "en"
CONNECTION_URL = "wss://eu2.rt.speechmatics.com/v2"

async def enroll_speakers():
    handler_tasks: list[asyncio.Task] = []

    transcription_config = speechmatics.models.TranscriptionConfig(**{
            "language": LANGUAGE,
            "diarization": "speaker",
        }
    )

    # Create a transcription client
    client = speechmatics.client.WebsocketClient(
        speechmatics.models.ConnectionSettings(
            url=CONNECTION_URL,
            auth_token=API_KEY,
        )
    )
    # Register the event handler for RecognitionStarted
    # to send the GetSpeakers(final=True) request
    client.add_event_handler(
        ServerMessageType.RecognitionStarted,
        lambda _: handler_tasks.append(
            asyncio.create_task(client.send_message(ClientMessageType.GetSpeakers, {"final": True}))
        ),
    )
    # Register the event handler for SpeakersResult
    # to print the speaker identifiers obtained from the server
    client.add_event_handler(
        ServerMessageType.SpeakersResult,
        lambda message: print(f"[speaker identifiers] {message['speakers']}"),
    )

    with open(PATH_TO_FILE, "rb") as fh:
        await asyncio.create_task(client.run(fh, transcription_config))

    for task in handler_tasks:
        await task

if __name__ == "__main__":
    asyncio.run(enroll_speakers())

Real-time speakers identification example.

import asyncio
import speechmatics
import speechmatics.models
import speechmatics.client
from speechmatics.client import ServerMessageType

API_KEY = "YOUR_API_KEY"
PATH_TO_FILE = "example.wav"
LANGUAGE = "en"
CONNECTION_URL = "wss://eu2.rt.speechmatics.com/v2"

async def identify_speakers():
    handler_tasks: list[asyncio.Task] = []

    transcription_config = speechmatics.models.TranscriptionConfig(**{
            "language": LANGUAGE,
            "diarization": "speaker",
            "speaker_diarization_config": {
                # add your speaker identifiers
                "speakers": [
                    {"label": "Alice", "speaker_identifiers": ["<alice_id1>", "<alice_id2>"]},
                    {"label": "Bob", "speaker_identifiers": ["<bob_id1>"]},
                ]
            }
        }
    )
    # Create a transcription client
    client = speechmatics.client.WebsocketClient(
        speechmatics.models.ConnectionSettings(
            url=CONNECTION_URL,
            auth_token=API_KEY,
        )
    )
    # Optionally, add transcript handler
    client.add_event_handler(
        ServerMessageType.AddTranscript,
        lambda message: print(f"[transcript] {message['results']}"),
    )

    with open(PATH_TO_FILE, "rb") as fh:
        await asyncio.create_task(client.run(fh, transcription_config))

    for task in handler_tasks:
        await task


if __name__ == "__main__":
    asyncio.run(identify_speakers())

Enrollment​

Identification​

Code examples​

Enrollment

Identification

Code examples