Quickstart
Learn how to convert streaming audio to text.The easiest way to try Realtime transcription is via the web portal.
Using the Realtime SaaS WebSocket API
1. Create an API key
Create an API key in the portal here, which you'll use to securely access the API. Store the key as a managed secret.
Enterprise customers may need to speak to Support to get your API keys.
2. Pick and install a library
Check out our JavaScript client or Python client to get started.
npm install @speechmatics/real-time-client @speechmatics/auth
pip3 install speechmatics-python
3. Insert your API key
Paste your API key into YOUR_API_KEY
in the code.
import https from "node:https";
import { createSpeechmaticsJWT } from "@speechmatics/auth";
import { RealtimeClient } from "@speechmatics/real-time-client";
const apiKey = YOUR_API_KEY;
const client = new RealtimeClient();
const streamURL = "https://media-ice.musicradio.com/LBCUKMP3";
async function transcribe() {
// Print transcript as we receive it
client.addEventListener("receiveMessage", ({ data }) => {
if (data.message === "AddTranscript") {
for (const result of data.results) {
if (result.type === "word") {
process.stdout.write(" ");
}
process.stdout.write(`${result.alternatives?.[0].content}`);
if (result.is_eos) {
process.stdout.write("\n");
}
}
} else if (data.message === "EndOfTranscript") {
process.stdout.write("\n");
process.exit(0);
} else if (data.message === "Error") {
process.stdout.write(`\n${JSON.stringify(data)}\n`);
process.exit(1);
}
});
const jwt = await createSpeechmaticsJWT({
type: "rt",
apiKey,
ttl: 60, // 1 minute
});
await client.start(jwt, {
transcription_config: {
language: "en",
operating_point: "enhanced",
max_delay: 1.0,
transcript_filtering_config: {
remove_disfluencies: true,
},
},
});
const stream = https.get(streamURL, (response) => {
// Handle the response stream
response.on("data", (chunk) => {
client.sendAudio(chunk);
});
response.on("end", () => {
console.log("Stream ended");
client.stopRecognition({ noTimeout: true });
});
response.on("error", (error) => {
console.error("Stream error:", error);
client.stopRecognition();
});
});
stream.on("error", (error) => {
console.error("Request error:", error);
client.stopRecognition();
});
}
transcribe();
import speechmatics
from httpx import HTTPStatusError
from urllib.request import urlopen
API_KEY = "YOUR_API_KEY"
LANGUAGE = "en"
CONNECTION_URL = "wss://eu2.rt.speechmatics.com/v2"
# The raw audio stream will be a few seconds ahead of the radio
AUDIO_STREAM_URL = "https://media-ice.musicradio.com/LBCUKMP3" # LBC Radio stream
audio_stream = urlopen(AUDIO_STREAM_URL)
# Create a transcription client
ws = speechmatics.client.WebsocketClient(
speechmatics.models.ConnectionSettings(
url=CONNECTION_URL,
auth_token=API_KEY,
)
)
# Define an event handler to print the partial transcript
def print_partial_transcript(msg):
print(f"[partial] {msg['metadata']['transcript']}")
# Define an event handler to print the full transcript
def print_transcript(msg):
print(f"[ FULL] {msg['metadata']['transcript']}")
# Register the event handler for partial transcript
ws.add_event_handler(
event_name=speechmatics.models.ServerMessageType.AddPartialTranscript,
event_handler=print_partial_transcript,
)
# Register the event handler for full transcript
ws.add_event_handler(
event_name=speechmatics.models.ServerMessageType.AddTranscript,
event_handler=print_transcript,
)
settings = speechmatics.models.AudioSettings()
# Define transcription parameters
# Full list of parameters described here: https://speechmatics.github.io/speechmatics-python/models
conf = speechmatics.models.TranscriptionConfig(
operating_point="enhanced",
language=LANGUAGE,
enable_partials=True,
max_delay=1,
)
print("Starting transcription (type Ctrl-C to stop):")
try:
ws.run_synchronously(audio_stream, conf, settings)
except KeyboardInterrupt:
print("\nTranscription stopped.")
except HTTPStatusError as e:
if e.response.status_code == 401:
print("Invalid API key - Check your API_KEY at the top of the code!")
else:
raise e
Transcript outputs
The API returns transcripts in JSON format. You can receive two types of output: Final and Partial transcripts. Choose the type based on your latency and accuracy needs.
Final transcripts
Final transcripts are the definitive result.
- They reflect the best transcription for the spoken audio.
- Once displayed, they are not updated.
- Words arrive incrementally, with some delay.
You control the latency and accuracy tradeoff using the max_delay
setting in your transcription_config
.
Larger values of max_delay
increase accuracy by giving the system more time to process audio context.
Best for accurate, completed transcripts where some delay is acceptable
Partial transcripts
Partial transcripts are low-latency and can update later as more conversation context arrives.
- You must enable them using
enable_partials
in yourtranscription_config
. - Partials are emitted quickly (typically less than 500ms).
- The engine may revise them as more audio is processed.
You can combine partials with finals for a responsive user experience — show partials first, then replace them with finals as they arrive.
You control the latency and accuracy tradeoff using the max_delay
setting in your transcription_config
.
Use partials for: real-time captions, voice interfaces, or any case where speed matters