Speech to TextRealtime Transcription

Input

Learn about the supported input audio formats for the Speechmatics Real-time API

This page is about the Real-time transcription API (websocket).

For information on Batch SaaS, see the Batch SaaS input.
For information on Flow Voice AI, see the Flow Voice AI input.

Supported input audio formats

Sessions can be configured to use two types of audio input, file and raw. Unless you have a specific reason to use the file option, we recommend using the raw option.

For capturing raw audio in the browser, try our browser-audio-input package, available here on NPM.

`audio_format`

The format must be supplied in the audio_format field of the StartRecognition message. See the API reference.

oneOf

Raw
File

Raw audio samples, described by the following additional mandatory fields:

typerequired

Constant value: raw

encodingstringrequired

Possible values: [pcm_f32le, pcm_s16le, mulaw]

sample_rateintegerrequired

The sample rate of the audio in Hz.

Example: {"type":"raw","encoding":"pcm_s16le","sample_rate":44100}

Choose this option to send audio encoded in a recognized format. The AddAudio messages have to provide all the file contents, including any headers. The file is usually not accepted all at once, but segmented into reasonably sized messages.

Note: Only the following formats are supported: wav, mp3, aac, ogg, mpeg, amr, m4a, mp4, flac

typerequired

Constant value: file

Sending audio

After receiving a RecognitionStarted message, you can start sending audio over the Websocket connection. Audio is sent as binary data, encoded in the format specified in the StartRecognition message. See Protocol overview for complete details of the API protocol.

Supported input audio formats​

audio_format​

Sending audio​

Supported input audio formats

`audio_format`

Sending audio