Real-Time
Transcription:Real-TimeDeployments:KubernetesQuick Start
Installation
Providing the Prerequisites have been met for the Speechmatics Helm chart, use the command below to install:
# Install the sm-realtime chart
helm upgrade --install speechmatics-realtime \
oci://speechmaticspublic.azurecr.io/sm-charts/sm-realtime \
--version 0.5.7 \
--set proxy.ingress.url="speechmatics.example.com"
Validate the Capacity
You can confirm wether the transcribers and inference servers are available using:
kubectl get sessiongroups
If the transcribers and inference servers are available, it will show CAPACITY
meaning that they have successfully registered.
NAME REPLICAS CAPACITY USAGE VERSION SPEC HASH
inference-server-enhanced-recipe1 1 360 0 1 b5784af49332f9948481195451eab6ca
speechmatics-realtime-en 1 2 0 1 83929f2b9b2448cdc818d0e46e37600b
Run a Session
speechmatics rt transcribe \
--url wss://speechmatics.example.com/v1 \
--lang en \
--operating-point enhanced \
--ssl-mode insecure \
<audio-file>
Configuration
See the examples below on how to configure the Helm chart for different deployment scenarios.
- All Languages
- English Standard + Enhanced
- Auto-Scaling
global:
transcriber:
languages: ["ar", "ba", "be", "bg", "bn", "ca", "cmn", "cmn_en", "cs", "cy", "da", "de", "el", "en", "en_ms", "en_ta", "eo", "es", "es-bilingual-en", "et", "eu", "fa", "fi", "fr", "ga", "gl", "he", "hi", "hr", "hu", "ia", "id", "it", "ja", "ko", "lt", "lv", "mn", "mr", "ms", "mt", "nl", "no", "pl", "pt", "ro", "ru", "sk", "sl", "sv", "sw", "ta", "th", "tr", "ug", "uk", "ur", "vi", "yue"]
# Enable all enhanced and standard inference server recipes
inferenceServerEnhancedRecipe1:
enabled: true
inferenceServerEnhancedRecipe2:
enabled: true
inferenceServerEnhancedRecipe3:
enabled: true
inferenceServerEnhancedRecipe4:
enabled: true
inferenceServerStandardAll:
enabled: true
# Disable default enhanced inference server recipes
inferenceServerEnhancedRecipe1:
enabled: false
# Enable custom inference server deployment with just en models
inferenceServerCustom:
enabled: true
fullnameOverride: inference-server-en
tritonServer:
image:
# Repository for the en-only inference server triton container
repository: sm-gpu-inference-server-en
inferenceSidecar:
enabled: true
# Configuration for custom model deployments
registerFeatures:
capacity: 600
customModelCosts:
"*:diar_standard": 0
"*:body_standard": 0
"*:diar_enhanced": 0
"*:body_enhanced": 0
en:am_en_standard: 0
en:ensemble_en_standard: 20
en:lm_en_enhanced: 10
en:am_en_enhanced: 0
en:ensemble_en_enhanced: 20
global:
# Enable scaling for all sessiongroups resources
sessionGroups:
scaling:
enabled: true
inferenceServerEnhancedRecipe1:
sessionGroups:
scaling:
# Scale up inference server pods when there are 300 inference tokens remaining
scaleOnCapacityLeft: 300
transcribers:
sessionGroups:
scaling:
# Scale up transcriber pods when there is only capacity for 1 more session
scaleOnCapacityLeft: 1
Hardware Recommendations
Below are the recommended Azure node sizes for running Speechmatics on Kubernetes:
Service | Node Size |
---|---|
STT (Inference Server) | Standard_NC4as_T4_v3 |
STT (Transcriber) | Standard_E16s_v5 |
All Other Services | Standard_D*s_v5 |
Uninstall
Run the following command to uninstall Speechmatics from the cluster:
helm uninstall speechmatics-realtime
FAQ
Why should I use the sm-realtime Helm chart over a Docker container deployment?
The sm-realtime chart is the recommended way of running Speechmatics containers for production environments. It provides a set of containers which will help protect and auto-scale sensitive websocket connections, and ensure performance with session capacity management. It also provides cost benefits with custom scheduling to help bin-pack active workers onto busy Kubernetes nodes.
What is SessionGroups?
SessionGroups is Speechmatics' custom Kubernetes auto-scaling and session management solution for websocket containers. SessionGroups ensures that cluster nodes can safely scale up and down without impacting live STT and Flow sessions running on it. Additionally, it will bin-pack new sessions onto busy nodes for cost efficiency. It comes as a CustomResourceDefinition (CRD) and controller deployed as part of the Speechmatics Realtime Helm chart.