Skip to main content
Speech to TextRealtime Transcription

Quickstart

Learn how to convert streaming audio to text.

The easiest way to try Realtime transcription is via the web portal.

Using the Realtime SaaS WebSocket API

1. Create an API key

Create an API key in the portal here, which you'll use to securely access the API. Store the key as a managed secret.

Enterprise customers may need to speak to Support to get your API keys.

2. Pick and install a library

Check out our JavaScript client or Python client to get started.

npm install @speechmatics/real-time-client @speechmatics/auth

3. Insert your API key

Paste your API key into YOUR_API_KEY in the code.

import https from "node:https";
import { createSpeechmaticsJWT } from "@speechmatics/auth";
import { RealtimeClient } from "@speechmatics/real-time-client";

const apiKey = YOUR_API_KEY;
const client = new RealtimeClient();
const streamURL = "https://media-ice.musicradio.com/LBCUKMP3";

async function transcribe() {
// Print transcript as we receive it
client.addEventListener("receiveMessage", ({ data }) => {
if (data.message === "AddTranscript") {
for (const result of data.results) {
if (result.type === "word") {
process.stdout.write(" ");
}
process.stdout.write(`${result.alternatives?.[0].content}`);
if (result.is_eos) {
process.stdout.write("\n");
}
}
} else if (data.message === "EndOfTranscript") {
process.stdout.write("\n");
process.exit(0);
} else if (data.message === "Error") {
process.stdout.write(`\n${JSON.stringify(data)}\n`);
process.exit(1);
}
});

const jwt = await createSpeechmaticsJWT({
type: "rt",
apiKey,
ttl: 60, // 1 minute
});

await client.start(jwt, {
transcription_config: {
language: "en",
operating_point: "enhanced",
max_delay: 1.0,
transcript_filtering_config: {
remove_disfluencies: true,
},
},
});

const stream = https.get(streamURL, (response) => {
// Handle the response stream
response.on("data", (chunk) => {
client.sendAudio(chunk);
});

response.on("end", () => {
console.log("Stream ended");
client.stopRecognition({ noTimeout: true });
});

response.on("error", (error) => {
console.error("Stream error:", error);
client.stopRecognition();
});
});

stream.on("error", (error) => {
console.error("Request error:", error);
client.stopRecognition();
});
}

transcribe();

Transcript outputs

The API returns transcripts in JSON format. You can receive two types of output: Final and Partial transcripts. Choose the type based on your latency and accuracy needs.

Final transcripts

Final transcripts are the definitive result.

  • They reflect the best transcription for the spoken audio.
  • Once displayed, they are not updated.
  • Words arrive incrementally, with some delay.

You control the latency and accuracy tradeoff using the max_delay setting in your transcription_config. Larger values of max_delay increase accuracy by giving the system more time to process audio context.

Best for accurate, completed transcripts where some delay is acceptable

Partial transcripts

Partial transcripts are low-latency and can update later as more conversation context arrives.

  • You must enable them using enable_partials in your transcription_config.
  • Partials are emitted quickly (typically less than 500ms).
  • The engine may revise them as more audio is processed.

You can combine partials with finals for a responsive user experience — show partials first, then replace them with finals as they arrive.

You control the latency and accuracy tradeoff using the max_delay setting in your transcription_config.

Use partials for: real-time captions, voice interfaces, or any case where speed matters