Input
Learn about the supported input audio formats for the Speechmatics Real-time APIThis page is about the Real-time transcription API (websocket).
- For information on Batch SaaS, see the Batch SaaS input.
- For information on Flow Voice AI, see the Flow Voice AI input.
Supported input audio formats
Sessions can be configured to use two types of audio input, file
and raw
. Unless you have a specific reason to use the file
option, we recommend using the raw
option.
For capturing raw audio in the browser, try our browser-audio-input
package, available here on NPM.
audio_format
The format must be supplied in the audio_format
field of the StartRecognition
message. See the API reference.
- Raw
- File
Raw audio samples, described by the following additional mandatory fields:
raw
Possible values: [pcm_f32le
, pcm_s16le
, mulaw
]
The sample rate of the audio in Hz.
Example: {"type":"raw","encoding":"pcm_s16le","sample_rate":44100}
Choose this option to send audio encoded in a recognized format. The AddAudio messages have to provide all the file contents, including any headers. The file is usually not accepted all at once, but segmented into reasonably sized messages.
Note: Only the following formats are supported: wav
, mp3
, aac
, ogg
, mpeg
, amr
, m4a
, mp4
, flac
file
Sending audio
After receiving a RecognitionStarted
message, you can start sending audio over the Websocket connection. Audio is sent as binary data, encoded in the format specified in the StartRecognition
message. See Protocol overview for complete details of the API protocol.