Diarization
Learn how Speechmatics diarization separates speakers in audioUse Speechmatics' Diarization to separate a transcript into distinct speakers or channels, so you can clearly see who said what.
It’s especially useful in conversations, meetings, interviews, or multi-channel recordings where keeping track of each voice matters. By labeling speakers or channels, diarization makes transcripts easier to read, analyze, and share.
Use cases
- Call centers – Identify agents and customers for training, compliance, and quality assurance.
- Video conferences – Track who said what in multi-participant discussions.
- Medical consultations – Capture conversations between doctors and patients with clear speaker labels.
- Media production – Make multi-speaker audio easier to edit, search, and annotate.
Diarization modes
Speechmatics offer the following diarization modes:
-
Speaker diarization — Identifies each speaker by their voice.
Useful when there are multiple speakers in the same audio recording. -
Channel diarization — Transcribes each audio channel separately.
Useful when each speaker is recorded on a separate audio channel. -
Channel & speaker diarization — Transcribes each channel separately and also identifies individual speakers within each channel.
Useful when multiple speakers are recorded across multiple channels.
Channel and speaker diarization is only available for Realtime transcription.
These modes can be used with our Realtime or Batch APIs:
Realtime diarization
Used for live, streaming audio. Ideal for scenarios like video conferencing, real-time captions, and conversational AI.
Batch diarization
Used for pre-recorded audio files. Great for scenarios like call recordings, podcasts, and long-form interviews.