Skip to main content
Speech to TextRealtime Transcription

Quickstart

Learn how to convert streaming audio to text.

The easiest way to try real-time transcription is via the web portal.

Using the Real-time SaaS WebSocket API

1. Create an API key

Create an API key in the portal here, which you'll use to securely access the API. Store the key as a managed secret.

Enterprise customers may need to speak to Support to get your API keys.

2. Pick and install a library

Check out our Javascript client or Python client to get started.

npm install @speechmatics/real-time-client @speechmatics/auth

3. Insert your API key

Paste your API key into YOUR_API_KEY in the code.

import https from "node:https";
import { createSpeechmaticsJWT } from "@speechmatics/auth";
import { RealtimeClient } from "@speechmatics/real-time-client";

const apiKey = YOUR_API_KEY;

const client = new RealtimeClient();

const streamURL = "https://media-ice.musicradio.com/LBCUKMP3";

const stream = https.get(streamURL, (response) => {
// Handle the response stream
response.on("data", (chunk) => {
client.sendAudio(chunk);
});

response.on("end", () => {
console.log("Stream ended");
client.stopRecognition({ noTimeout: true });
});

response.on("error", (error) => {
console.error("Stream error:", error);
client.stopRecognition();
});
});

stream.on("error", (error) => {
console.error("Request error:", error);
client.stopRecognition();
});

client.addEventListener("receiveMessage", ({ data }) => {
if (data.message === "AddTranscript") {
for (const result of data.results) {
if (result.type === "word") {
process.stdout.write(" ");
}
process.stdout.write(`${result.alternatives?.[0].content}`);
if (result.is_eos) {
process.stdout.write("\n");
}
}
} else if (data.message === "EndOfTranscript") {
process.stdout.write("\n");
process.exit(0);
} else if (data.message === "Error") {
process.stdout.write(`\n${JSON.stringify(data)}\n`);
process.exit(1);
}
});

createSpeechmaticsJWT({
type: "rt",
apiKey,
ttl: 60, // 1 minute
}).then((jwt) => {
client.start(jwt, {
transcription_config: {
language: "en",
operating_point: "enhanced",
max_delay: 1.0,
transcript_filtering_config: {
remove_disfluencies: true,
},
},
});
});

Transcript outputs

The API returns transcripts in JSON format. You can receive two types of output: Final and Partial transcripts. Choose the type based on your latency and accuracy needs.

Final transcripts

Final transcripts are the definitive result.

  • They reflect the best transcription for the spoken audio.
  • Once displayed, they are not updated.
  • Words arrive incrementally, with some delay.

You control the latency and accuracy tradeoff using the max_delay setting in your transcription_config. Larger values of max_delay increase accuracy by giving the system more time to process audio context.

Best for accurate, completed transcripts where some delay is acceptable

Partial transcripts

Partial transcripts are low-latency and can update later as more conversation context arrives.

  • You must enable them using enable_partials in your transcription_config.
  • Partials are emitted quickly (typically less than 500ms).
  • The engine may revise them as more audio is processed.

You can combine partials with finals for a responsive user experience — show partials first, then replace them with finals as they arrive.

You control the latency and accuracy tradeoff using the max_delay setting in your transcription_config.

Use partials for: real-time captions, voice interfaces, or any case where speed matters