The Speechmatics Real-time Virtual Appliance operates on a hypervisor host system. For this version of the appliance, the following hypervisors are supported:
For the virtual appliance to operate as required, the host must meet the requirements and have the resources available as defined below.
The virtual appliance can operate in any VMware supported environment that claims support for VMware virtual hardware specification 9 and above (see https://kb.vmware.com/s/article/1003746).
The host machine requires a processor with following minimum specification: Intel® Xeon® CPU E5-2630 v4 (Sandy Bridge) 2.20GHz (or equivalent). This is important because these chipsets (and later ones) support Advanced Vector Extensions (AVX). The machine learning algorithms used by Speechmatics ASR require the performance optimizations that AVX provides. You should also ensure that your hypervisor has AVX enabled.
See below for minimum Real-time Virtual Appliance VM (guest) specifications; the host machine must have enough resources (processor, memory and storage) to run the hypervisor, the guest VMs you intend to host on it, plus any other processes you expect to run on it. Vendor guidelines should be followed for other host requirements and installation process.
For VMWare, the document Performance Best Practices for VMware vSphere® 6.0 contains a comprehensive overview of hardware considerations and recommendations on how to optimize your host platform. See https://www.vmware.com/support.html for up-to-date technical information on VMWare.
For VirtualBox, please consult the online documentation: https://www.virtualbox.org/wiki/Documentation
For Amazon EC2, the following link explains how to setup a VM using an Amazon S3 to store the OVA file: https://docs.aws.amazon.com/vm-import/latest/userguide/vmimport-image-import.html.
The Speechmatics Real-Time Real-time Virtual Appliance must be allocated the following minimum specification:
For each concurrent input stream the appliance requires an additional 1 vCPU and up to 1.5GB RAM. If you are using the custom dictionary (additional words) feature then each concurrent input stream that is configured to use it will require up to 3GB RAM.
For operation in batch mode, the following minimum specifications are required: