Skip to main content

Prerequisites

Transcription:Real-TimeDeployments:Kubernetes

Access to the Docker and Helm Registry

The Speechmatics Docker images and Helm chart are obtained from the speechmaticspublic.azurecr.io registry. If you do not have credentials for the speechmaticspublic.azurecr.io registry account or have lost your details, please reach out to Support.

Add a docker-registry secret to the cluster so it can successfully pull Speechmatics docker images.

# Add the speechmatics registry credentials for image pulling
kubectl create secret docker-registry speechmatics-registry \
  --docker-server=speechmaticspublic.azurecr.io \
  --docker-username=<username> \
  --docker-password=<password>

Using the same credentials, authenticate Helm with the speechmaticspublic registry to install the Helm chart:

# Authenticate to the Speechmatics Helm repository
helm registry login speechmaticspublic.azurecr.io \
  --username <username> \
  --password <password>

Speechmatics License

Please speak to support@speechmatics.com if you do not already have a valid Speechmatics license.

The Helm chart requires you to have a valid Speechmatics license stored in a secret called speechmatics-license, on the Kubernetes cluster.

You can add a secret to the cluster with these commands:

kubectl create secret generic speechmatics-license \
  --from-literal=license.json="$(cat $LICENSE_FILE)"

Alternatively, you can configure the chart to create the secret for your Speechmatics license secret for you using the following values:

global:
  licensing:
    createSecret: true
    license: $B64_ENCODED_LICENSE

GPU Drivers

The Speechmatics inference server runs Nvidia Triton Server, which requires an Nvidia GPU. When running GPU nodes in Kubernetes, you will require the Nvidia device plugin which allows containers on the cluster to access the GPUs.

Below is a list of the common cloud providers and their recommended way of deploying the Nvidia device plugin on a cluster:

Alternatively, see the Nvidia Device Plugin docs.

You can validate a node has allocatable GPU resources with:

kubectl get nodes -o yaml | yq .[].[].status.allocatable | grep nvidia

Nginx Ingress Controller

When setting up Speechmatics via an ingress controller, it is recommended to use the ingress-nginx ingress controller with snippet annotations enabled. You can confirm if your cluster supports Nginx with snippet annotations enabled using the following command:

# The default is false
kubectl get cm -o yaml -l app.kubernetes.io/instance=nginx | grep allow-snippet-annotations

If you are not already running ingress-nginx, follow the below steps:

  1. Create an nginx.values.yaml file:
controller:
  service:
    # This is needed to preserve the source IP of the client
    # See: https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer/#preserving-the-client-source-ip
    externalTrafficPolicy: Local

    # This should be set to the IP address used to access services on your cluster
    loadBalancerIP: $CLUSTER_INGRESS_IP

  config:
    # This prevents nginx worker process from shutting down for 24h in case of active sessions
    worker-shutdown-timeout: 86400s

  extraArgs:
    # Needed to support annotations added by the chart ingresses
    annotations-prefix: nginx.ingress.kubernetes.io

    # This prevents nginx main process from shutting down for 24h in case of active sessions
    shutdown-grace-period: 86400
 
  # Used to prevent nginx pods being terminated for 24h while there are active sessions
  terminationGracePeriodSeconds: 86400

  # Needed to allow ingresses to add snippet annotations to add necessary headers
  allowSnippetAnnotations: true
  1. Install the Nginx chart with:
helm repo add nginx https://kubernetes.github.io/ingress-nginx
helm install nginx nginx/ingress-nginx --version 4.11.4 -f nginx.values.yaml

Using Another Ingress Controller

If you are running another ingress controller, when enabling ingress on the chart, you need to ensure that a Request-Id header is passed through. This is used to manage session usage.

In Nginx, it looks like this:

proxy:
  ingress:
    annotations:
      # Add headers to all requests coming through this ingress
      nginx.ingress.kubernetes.io/configuration-snippet: |+
        more_set_headers "Request-Id: $req_id";