Performance and cost
Get an overview of the performance and cost of Speechmatics container deploymentsSpeech to text containers
This is a comparison of the performance and estimated running costs of transcription executing on standard Azure VMs. The comparison highlights the maximum number of concurrent real-time sessions (session density) and the maximum throughput for batch jobs on a single instance.
Batch transcription
The benchmark uses the following configuration:
For GPU Operating Points, transcribers and inference servers were all run on a single VM node.
Realtime transcription
This benchmark uses the following configuration4:
For GPU Operating Points, the transcribers and inference servers were run on a single VM node.
Each first session, transcriber requires 0.25 cores for both OPs, with 1.2 GB memory (Standard OP) or 3 GB memory (Enhanced OP). Every additional session consumes 0.1 cores and 100 MB of memory.
Translation (GPU)
Translation running on a 4-core T4 has an RTF of roughly 0.008. It can handle up to 125 hours of batch audio per hour, or 125 Real-Time Transcription streams. However, each translation target language is counted as a stream, meaning that a single Real-Time Transcription stream which requests 5 target languages adds the same load on the Translation Inference Server as 5 transcription streams each requesting a single target language.
Footnotes
-
Throughput is measured as hours of audio per hour of runtime. A throughput of 50 would mean that in one hour, the system as a whole can transcribe 50 hours of audio. ↩
-
An RTF of 1 would mean that a one hour file would take one hour to transcribe. An RTF of 0.1 would mean that a one hour file would take six minutes to transcribe. Benchmark RTFs are representative for processing audio files over 20 minutes in duration using
parallel=4
. ↩ -
Multiple sessions are handled by a single worker configured with the required concurrency. ↩ ↩2
-
Benchmark results reflect performance on a fully loaded inference server operating at the session density recommended for the respective GPU platform. ↩