Skip to main content
Speech to TextRealtime Transcription

Input

Learn about the supported input audio formats for the Speechmatics Real-time API

This page is about the Real-time transcription API (websocket).

Supported input audio formats

Sessions can be configured to use two types of audio input, file and raw. Unless you have a specific reason to use the file option, we recommend using the raw option.

For capturing raw audio in the browser, try our browser-audio-input package, available here on NPM.

audio_format

The format must be supplied in the audio_format field of the StartRecognition message. See the API reference.

oneOf

Raw audio samples, described by the following additional mandatory fields:

typerequired
Constant value: raw
encodingstringrequired

Possible values: [pcm_f32le, pcm_s16le, mulaw]

sample_rateintegerrequired

The sample rate of the audio in Hz.

Example: {"type":"raw","encoding":"pcm_s16le","sample_rate":44100}

Sending audio

After receiving a RecognitionStarted message, you can start sending audio over the Websocket connection. Audio is sent as binary data, encoded in the format specified in the StartRecognition message. See Protocol overview for complete details of the API protocol.