Skip to main content

Transcribe in Real-Time

Transcription:Real-TimeDeployments:All

The quickest way to try transcribing for free is by creating a Speechmatics account and using our Real-Time Demo in your browser.

This page will show you how to use the Speechmatics Real-Time SaaS WebSocket API to transcribe your voice in real-time by speaking into your microphone.

You can also learn about On-Prem deployments by following our guides.

Set Up

  1. Create an account on the Speechmatics On-Demand Portal here.
  2. Navigate to Manage > API Keys page in the Speechmatics On-Demand Portal.
  3. Enter a name for your API key and store your API key somewhere safe.
info

Enterprise customers should speak to Support to get your API keys.

Real-Time Transcription Examples

The examples below will help you get started by using the official Speechmatics CLI, Python and JavaScript libraries. You can of course integrate using the programming language of your choice by referring to the Real-Time API Reference.

The Speechmatics Python library and CLI can found on GitHub and installed using pip:

pip3 install speechmatics-python

Transcribe a file in real-time using the Speechmatics Python library. Just copy in your API key and file name to get started!

speechmatics config set --auth-token $API_KEY
speechmatics rt transcribe example.wav

Transcript Outputs

The output format from the Speech API is JSON. There are two types of transcript that are provided: Final transcripts and Partial transcripts. Which one you decide to consume will depend on your use case, latency and accuracy requirements.

Final Transcripts

A Final is the final best transcription for the words spoken. Once output, these transcripts are considered final and will not be updated afterwards. Words will be returned incrementally with a delay. The latency can be adjusted using the max_delay property in transcription_config when starting the recognition session. Final transcripts are more accurate than partial transcripts, and larger values of max_delay increase the accuracy.

Partial Transcripts

A Partial, is a transcript that can be updated at a later point in time as more context arrives. By default, Partial transcripts are not produced. Partials must be explicitly enabled using the enable_partials property in transcription_config for the session. After a Partial transcript is first output, the Speechmatics ASR engine can use additional audio data and context to update the Partial. Hence, Partials are therefore available at very low latency but with lower initial accuracy. Partials typically provide a latency (the time between audio input and initial output) of less than 500ms. Partials can be used in conjunction with Final transcripts to provide low-latency transcripts which are adjusted over time.