Performance and cost
Get an overview of the performance and cost of Speechmatics container deploymentsSpeech to text containers
This is a comparison of the performance and estimated running costs of transcription executing on standard Azure VMs. The comparison highlights the maximum number of concurrent real-time sessions (session density) and the maximum throughput for batch jobs on a single instance.
Batch transcription
The benchmark uses the following configuration:
For GPU Operating Points, transcribers and inference servers were all run on a single VM node.
Realtime transcription
This benchmark uses the following configuration4:
For GPU Operating Points, the transcribers and inference servers were run on a single VM node.
Each first session, transcriber requires 0.25 cores for both OPs, with 1.2 GB memory (Standard OP) or 3 GB memory (Enhanced OP). Every additional session consumes 0.1 cores and 100 MB of memory.
Translation (GPU)
Translation running on a 4-core T4 has an RTF of roughly 0.008. It can handle up to 125 hours of batch audio per hour, or 125 Real-Time Transcription streams. However, each translation target language is counted as a stream, meaning that a single Real-Time Transcription stream which requests 5 target languages adds the same load on the Translation Inference Server as 5 transcription streams each requesting a single target language.