Speech to TextRealtime Transcription

Quickstart

Learn how to convert streaming audio to text.

The easiest way to try Realtime transcription is via the web portal.

Using the Realtime SaaS WebSocket API

1. Create an API key

Create an API key in the portal here, which you'll use to securely access the API. Store the key as a managed secret.

Enterprise customers may need to speak to Support to get your API keys.

2. Pick and install a library

Check out our JavaScript client or Python client to get started.

npm install @speechmatics/real-time-client @speechmatics/auth

pip3 install speechmatics-python

3. Insert your API key

Paste your API key into YOUR_API_KEY in the code.

import https from "node:https";
import { createSpeechmaticsJWT } from "@speechmatics/auth";
import { RealtimeClient } from "@speechmatics/real-time-client";

const apiKey = YOUR_API_KEY;
const client = new RealtimeClient();
const streamURL = "https://media-ice.musicradio.com/LBCUKMP3";

async function transcribe() {
  // Print transcript as we receive it
  client.addEventListener("receiveMessage", ({ data }) => {
    if (data.message === "AddTranscript") {
      for (const result of data.results) {
        if (result.type === "word") {
          process.stdout.write(" ");
        }
        process.stdout.write(`${result.alternatives?.[0].content}`);
        if (result.is_eos) {
          process.stdout.write("\n");
        }
      }
    } else if (data.message === "EndOfTranscript") {
      process.stdout.write("\n");
      process.exit(0);
    } else if (data.message === "Error") {
      process.stdout.write(`\n${JSON.stringify(data)}\n`);
      process.exit(1);
    }
  });

  const jwt = await createSpeechmaticsJWT({
    type: "rt",
    apiKey,
    ttl: 60, // 1 minute
  });

  await client.start(jwt, {
    transcription_config: {
      language: "en",
      operating_point: "enhanced",
      max_delay: 1.0,
      transcript_filtering_config: {
        remove_disfluencies: true,
      },
    },
  });

  const stream = https.get(streamURL, (response) => {
    // Handle the response stream
    response.on("data", (chunk) => {
      client.sendAudio(chunk);
    });

    response.on("end", () => {
      console.log("Stream ended");
      client.stopRecognition({ noTimeout: true });
    });

    response.on("error", (error) => {
      console.error("Stream error:", error);
      client.stopRecognition();
    });
  });

  stream.on("error", (error) => {
    console.error("Request error:", error);
    client.stopRecognition();
  });
}

transcribe();

import speechmatics
from httpx import HTTPStatusError
from urllib.request import urlopen

API_KEY = "YOUR_API_KEY"
LANGUAGE = "en"
CONNECTION_URL = "wss://eu2.rt.speechmatics.com/v2"

# The raw audio stream will be a few seconds ahead of the radio
AUDIO_STREAM_URL = "https://media-ice.musicradio.com/LBCUKMP3"  # LBC Radio stream

audio_stream = urlopen(AUDIO_STREAM_URL)

# Create a transcription client
ws = speechmatics.client.WebsocketClient(
    speechmatics.models.ConnectionSettings(
        url=CONNECTION_URL,
        auth_token=API_KEY,
    )
)


# Define an event handler to print the partial transcript
def print_partial_transcript(msg):
    print(f"[partial] {msg['metadata']['transcript']}")


# Define an event handler to print the full transcript
def print_transcript(msg):
    print(f"[   FULL] {msg['metadata']['transcript']}")


# Register the event handler for partial transcript
ws.add_event_handler(
    event_name=speechmatics.models.ServerMessageType.AddPartialTranscript,
    event_handler=print_partial_transcript,
)

# Register the event handler for full transcript
ws.add_event_handler(
    event_name=speechmatics.models.ServerMessageType.AddTranscript,
    event_handler=print_transcript,
)

settings = speechmatics.models.AudioSettings()

# Define transcription parameters
# Full list of parameters described here: https://speechmatics.github.io/speechmatics-python/models
conf = speechmatics.models.TranscriptionConfig(
    operating_point="enhanced",
    language=LANGUAGE,
    enable_partials=True,
    max_delay=1,
)

print("Starting transcription (type Ctrl-C to stop):")
try:
    ws.run_synchronously(audio_stream, conf, settings)
except KeyboardInterrupt:
    print("\nTranscription stopped.")
except HTTPStatusError as e:
    if e.response.status_code == 401:
        print("Invalid API key - Check your API_KEY at the top of the code!")
    else:
        raise e

Transcript outputs

The API returns transcripts in JSON format. You can receive two types of output: Final and Partial transcripts. Choose the type based on your latency and accuracy needs.

Final transcripts

Final transcripts are the definitive result.

They reflect the best transcription for the spoken audio.
Once displayed, they are not updated.
Words arrive incrementally, with some delay.

You control the latency and accuracy tradeoff using the max_delay setting in your transcription_config. Larger values of max_delay increase accuracy by giving the system more time to process audio context.

Best for accurate, completed transcripts where some delay is acceptable

Partial transcripts

Partial transcripts are low-latency and can update later as more conversation context arrives.

You must enable them using enable_partials in your transcription_config.
Partials are emitted quickly (typically less than 500ms).
The engine may revise them as more audio is processed.

You can combine partials with finals for a responsive user experience — show partials first, then replace them with finals as they arrive.

You control the latency and accuracy tradeoff using the max_delay setting in your transcription_config.

Use partials for: real-time captions, voice interfaces, or any case where speed matters

Using the Realtime SaaS WebSocket API​

1. Create an API key​

2. Pick and install a library​

3. Insert your API key​

Transcript outputs​

Final transcripts​

Partial transcripts​