Skip to main content

Virtual Appliance Scaling

Transcription:BatchReal-TimeDeployments:Virtual Appliance

Real-Time Virtual Appliance Scaling

This section explains how to scale the Real-Time Virtual Appliance, and gives advice on how to make sure you've allocated enough resources for your workload.

Worker Limits

The number of concurrent workers can be restricted using the Management API. This can be used to ensure that the system resources do not get exhausted by clients starting more sessions than expected. The maximum number of concurrent workers is set for the entire system, irrespective of which language packs are being used. The default number of maximum concurrent workers is 1.

View Maximum Workers

Use a GET request to the maxworkers endpoint to view the maximum number of workers:

curl -L -X GET 'http://${APPLIANCE_HOST}:8080/v1/management/maxworkers' \
   -H 'Accept: application/json' \
   | jq

This shows the maximum number of workers that can run concurrently on the appliance. If more sessions are opened by clients using the Speech API then you will receive the job error: No worker can be scheduled because the service is at capacity.

Setting Maximum Workers

Before changing the maximum number of concurrent workers for Real-Time transcription, it is important that the virtual appliance has enough system resources (CPU and RAM) to support the new requirement (see the virtual appliance system requirements). This example shows how to set the maximum number of concurrent workers to 5:

curl -L -X POST 'http://${APPLIANCE_HOST}:8080/v1/management/maxworkers' \
    -H 'Accept: application/json' \
    -H 'Content-Type: application/json' \
    -d '{ "count": "5" }'

As a rule of thumb, each concurrent worker will require 1 vCPU and up to 2GB RAM.

Batch Virtual Appliance Scaling

This section explains how to scale the Batch Virtual Appliance, and gives advice on how to make sure you've allocated enough resources for your workload.

Worker Limits

The number of concurrent workers (jobs) can be restricted using the Management API. This can be used to ensure that the system resources do not get exhausted by clients starting more transcriptions than expected. The maximum number of concurrent workers is set for the entire system, irrespective of which language packs are being used. The default number of maximum concurrent workers is 1.

View Maximum Workers

Use a GET request to the maxworkers endpoint to view the maximum number of workers:

curl -L -X GET 'http://${APPLIANCE_HOST}:8080/v1/management/maxworkers' \
   -H 'Accept: application/json' \
   | jq

The response will indicate the maximum number of workers that can run concurrently on the appliance. If more jobs are submitted by clients using the Speech API then these will be queued up and processed once there is spare capacity on the appliance.

Setting Maximum Workers

Before changing the maximum number of concurrent workers, it is important that the Virtual Appliance has enough system resources (CPU and RAM) to support the new requirements (see the Batch Virtual Appliance system requirements).

This example shows how to set the maximum number of concurrent workers to 5:

curl -L -X POST 'http://${APPLIANCE_HOST}:8080/v1/management/maxworkers' \
    -H 'Accept: application/json' \
    -H 'Content-Type: application/json' \
    -d'{ "count": "5" }'

Increasing the concurrent workers will increase the need for CPU and RAM. Look at the System Requements for details.

If the number of jobs submitted exceeds the maximum number of concurrent workers then jobs will start to be queued, and the real-time factor (RTF) will increase, meaning you will wait longer for your transcripts to be made available.