Quickstart
Learn how to convert streaming audio to text.The easiest way to try real-time transcription is via the web portal.
Using the Real-time SaaS WebSocket API
1. Create an API key
Create an API key in the portal here, which you'll use to securely access the API. Store the key as a managed secret.
Enterprise customers may need to speak to Support to get your API keys.
2. Pick and install a library
Check out our Javascript client or Python client to get started.
npm install @speechmatics/real-time-client @speechmatics/auth
pip3 install speechmatics-python
3. Insert your API key
Paste your API key into YOUR_API_KEY
in the code.
import https from "node:https";
import { createSpeechmaticsJWT } from "@speechmatics/auth";
import { RealtimeClient } from "@speechmatics/real-time-client";
const apiKey = YOUR_API_KEY;
const client = new RealtimeClient();
const streamURL = "https://media-ice.musicradio.com/LBCUKMP3";
const stream = https.get(streamURL, (response) => {
// Handle the response stream
response.on("data", (chunk) => {
client.sendAudio(chunk);
});
response.on("end", () => {
console.log("Stream ended");
client.stopRecognition({ noTimeout: true });
});
response.on("error", (error) => {
console.error("Stream error:", error);
client.stopRecognition();
});
});
stream.on("error", (error) => {
console.error("Request error:", error);
client.stopRecognition();
});
client.addEventListener("receiveMessage", ({ data }) => {
if (data.message === "AddTranscript") {
for (const result of data.results) {
if (result.type === "word") {
process.stdout.write(" ");
}
process.stdout.write(`${result.alternatives?.[0].content}`);
if (result.is_eos) {
process.stdout.write("\n");
}
}
} else if (data.message === "EndOfTranscript") {
process.stdout.write("\n");
process.exit(0);
} else if (data.message === "Error") {
process.stdout.write(`\n${JSON.stringify(data)}\n`);
process.exit(1);
}
});
createSpeechmaticsJWT({
type: "rt",
apiKey,
ttl: 60, // 1 minute
}).then((jwt) => {
client.start(jwt, {
transcription_config: {
language: "en",
operating_point: "enhanced",
max_delay: 1.0,
transcript_filtering_config: {
remove_disfluencies: true,
},
},
});
});
import speechmatics
from httpx import HTTPStatusError
from urllib.request import urlopen
API_KEY = "YOUR_API_KEY"
LANGUAGE = "en"
CONNECTION_URL = "wss://eu2.rt.speechmatics.com/v2"
# The raw audio stream will be a few seconds ahead of the radio
AUDIO_STREAM_URL = "https://media-ice.musicradio.com/LBCUKMP3" # LBC Radio stream
audio_stream = urlopen(AUDIO_STREAM_URL)
# Create a transcription client
ws = speechmatics.client.WebsocketClient(
speechmatics.models.ConnectionSettings(
url=CONNECTION_URL,
auth_token=API_KEY,
)
)
# Define an event handler to print the partial transcript
def print_partial_transcript(msg):
print(f"[partial] {msg['metadata']['transcript']}")
# Define an event handler to print the full transcript
def print_transcript(msg):
print(f"[ FULL] {msg['metadata']['transcript']}")
# Register the event handler for partial transcript
ws.add_event_handler(
event_name=speechmatics.models.ServerMessageType.AddPartialTranscript,
event_handler=print_partial_transcript,
)
# Register the event handler for full transcript
ws.add_event_handler(
event_name=speechmatics.models.ServerMessageType.AddTranscript,
event_handler=print_transcript,
)
settings = speechmatics.models.AudioSettings()
# Define transcription parameters
# Full list of parameters described here: https://speechmatics.github.io/speechmatics-python/models
conf = speechmatics.models.TranscriptionConfig(
operating_point="enhanced",
language=LANGUAGE,
enable_partials=True,
max_delay=1,
)
print("Starting transcription (type Ctrl-C to stop):")
try:
ws.run_synchronously(audio_stream, conf, settings)
except KeyboardInterrupt:
print("\nTranscription stopped.")
except HTTPStatusError as e:
if e.response.status_code == 401:
print("Invalid API key - Check your API_KEY at the top of the code!")
else:
raise e
Transcript outputs
The API returns transcripts in JSON format. You can receive two types of output: Final and Partial transcripts. Choose the type based on your latency and accuracy needs.
Final transcripts
Final transcripts are the definitive result.
- They reflect the best transcription for the spoken audio.
- Once displayed, they are not updated.
- Words arrive incrementally, with some delay.
You control the latency and accuracy tradeoff using the max_delay
setting in your transcription_config
.
Larger values of max_delay
increase accuracy by giving the system more time to process audio context.
Best for accurate, completed transcripts where some delay is acceptable
Partial transcripts
Partial transcripts are low-latency and can update later as more conversation context arrives.
- You must enable them using
enable_partials
in yourtranscription_config
. - Partials are emitted quickly (typically less than 500ms).
- The engine may revise them as more audio is processed.
You can combine partials with finals for a responsive user experience — show partials first, then replace them with finals as they arrive.
You control the latency and accuracy tradeoff using the max_delay
setting in your transcription_config
.
Use partials for: real-time captions, voice interfaces, or any case where speed matters