Speech to text overview
Learn how to turn audio into text.Use speech to text to transcribe using one of the modes:
- Real-time processing: Stream audio from an input device or file and receive instant updates of the transcription as it happens
- Batch processing: Submit an audio file and receive a complete text transcription once the processing is finished
Developer quickstart
Transcribe in real-time
Instantly convert streaming audio to text with Real-time processingTranscribe a file
Submit an audio file and receive a complete text transcription once the processing is finishedThe quickest way to transcribe voice from audio is in our web portal.
Deployments
Speechmatics provides flexible deployment options tailored to your requirements. You can host the platform in your own environment, use our managed service, or choose a hybrid approach.
For deploying our API in your environment, contact sales, or see our on-prem documentation.
Real-time processing
Turn live audio into accurate transcripts — instantly.
The Speechmatics real-time speech to text API converts spoken audio into text with low latency and high accuracy.
- Use when speed matters
- Transcribe live broadcasts or events
- Caption webinars, meetings, or podcasts in real time
- Power voice assistants or AI agents with live input
- Monitor contact center calls as they happen
- Build accessibility features like live captions
Operating points
Choose between two accuracy models when configuring your real-time session:
- Standard — fast, efficient, and suitable for most use cases
- Enhanced — offers improved accuracy, especially for complex audio (e.g. noisy environments, varied accents), with slightly higher resource usage
Batch processing
Create transcripts from pre-recorded audio or video.
The Speechmatics batch speech to text API processes pre-recorded files asynchronously, returning highly accurate transcripts in a range of formats.
- Transcribe recorded meetings or interviews
- Caption on-demand videos and podcasts
- Generate searchable transcripts for media archives
- Process customer service recordings for compliance or insights
- Automate subtitles across large video libraries
What is a job?
Behind the scenes, each transcription request is handled as a job — a self-contained unit representing a single transcription task.
A job includes:
- The audio or video file to be transcribed
- Configuration settings (e.g. language, formatting, diarization)
- Metadata and status tracking
- The resulting transcript(s)
You submit a job to the API, monitor its progress, and retrieve results once it's complete. Jobs can be created via direct upload or by referencing a URL.
Operating points
Choose between two accuracy models to fit the use case for your Batch job:
- Standard — Optimised for faster turnaround with strong accuracy; ideal for compliance workflows and routine file-based transcription
- Enhanced — our highest-accuracy model for when precision is critical; recommended for captioning or content destined for public use
Both modes support the same input/output formats and features — only the underlying model differs.