Offline usage reporting
Learn about offline usage reporting for on-prem deploymentsTo support on-prem solutions which require access to the internet to be blocked, usage data can be collected locally using a usage container. This data is then exported and sent to Speechmatics via email for review. Depending on the on-prem solution you require, the setup process is slightly different. See the sections below on Containers or Appliances for details.
The Usage Container only collates data that is required for Speechmatics to calculate accurate financial billing and measure product usage and system performance. This data is made up of a series of events that correspond to the various stages of a Speechmatics Batch or Real-Time Container as it processes a media file.
No personal customer data, transcripts or media data is captured or stored at any point. See What Data Do We Record for a full description of what information will be recorded.
The customer is responsible for assigning storage to the Usage Container and or Batch Appliance in order to capture all usage information, and sending data to Speechmatics at regular intervals.
Reporting cadence
Speechmatics requires customers to send all usage data by the last working day of each calendar month. You should send data for each Usage Container and/or batch appliance you have running in your environments. For customers with very large transcription volumes, more regular reporting may be recommended. Large transcription volumes can mean:
- Large number of jobs
- This means any Usage Container that will store data from more than 10,000 Batch jobs in a calendar month, or 1250 Real-Time jobs of more than an hour
- Many jobs of long duration (>60 minutes), especially when using the Real-Time Container in a 'streaming' mode where it persists between sessions
Sending data to Speechmatics
The exported data must not be modified in any way before sending to Speechmatics. Speechmatics will request a new unmodified data export if it is found that data has been altered.
Data is retained in the Usage Container for 90 days, after which point it is purged.
After exporting, Speechmatics requires data to be sent via email to billing-reporting@speechmatics.com.
Speechmatics recommends file sizes to not exceed 25MB. This is the default limit for sending emails for many popular providers like Microsoft Office 365. Files in excess of this size may trigger an error when sending by your email provider.
You will receive a confirmation email within 15 minutes if the report(s) get accepted by our billing system. If the "Reply-To" header on the email you send contains multiple email addresses, we will send a reply email to only the first address in the list.
Any attachment sent to Speechmatics must have the correct file name extension: .json.gz
.
For details on how to export usage data for a given on-prem solution see here Container and Appliance for more details.
Deployment type
Virtual Appliance
The usage mode can be set via the Management API
curl -L -u admin:admin -X 'POST' \
"http://${APPLIANCE_HOST}/v2/management/usagereporting" \
-d '{"mode": "offline"}'
where mode is either offline
or online
.
When the usage mode is set to offline
, usage will be collected via a Container inside the appliance, the data collected by this Container will need to be sent to Speechmatics via email at billing-reporting@speechmatics.com.
Workflow
The following workflow is recommended:
- The user downloads and runs one or more of the Virtual Appliances
- Before running any jobs, the user sets the usage mode to
offline
(see above). - At intervals of no more than a calendar month, the user will extract usage data processed in that interval from each running Appliance via the Management API, see below
- The user will then send this data to a designated Speechmatics email address at billing-reporting@speechmatics.com.
Exporting usage data
The exported data must not be modified in any way before sending to Speechmatics. Speechmatics will request a new unmodified data export if it is found that data has been altered.
Data is retained in the appliance for 90 days, after which point it is purged. Exported data needs to be sent to via email to billing-reporting@speechmatics.com.
A compressed archive of the usage data can be retrieved via the Management API
In realtime
mode you may see usage for contract id -77777777777777
, this is a prewarming job that runs during the transcriber first startup, and will not be included in your billed usage.
curl -X 'GET' \
"http://${APPLIANCE_HOST}/v1/export?since={start_time}&until={end_time}" \
-H 'accept: application/gzip'
Where start_time
and end_time
are inclusive and are timestamps in the ISO-8601 format (YYYY-MM-DDTHH:MM:SSZ).
To remain under the 25MB email attachment limit, we recommend changing start_time
and end_time
to chunk exports into 25MB files (usually around 10,000 batch jobs or 1250 real-time sessions of one hour).
Data is exported in compressed json.gz
format. All files must be sent in this format to Speechmatics. The name of the file does not matter. You can send multiple attachments per email, or each email as a separate attachment, so long as you are under email provider limits for sending files.
Container
Terminology
Throughout this section there are references to different types of containers:
- ASR Containers - Speechmatics containers that transcribe media or audio files into a transcript. Two types are available - those can process media in batch, and those that can process media in real-time. When these are specifically referred to they are called the Batch or Real-Time Containers
- Usage Containers - a new container that stores event-specific data from ASR Containers
Getting started
The ASR Usage Container can be retrieved from Speechmatics Docker Registry as a Docker Image. To access the Usage Container, you should use the same credentials that you use to access Speechmatics' ASR Containers from its Docker Registry. This information should already be provided to you by Support when you are onboarded.
You will also need to know the following information:
- Docker Registry URL, e.g.
https://speechmaticspublic.azurecr.io
- Image name, e.g.
asr-usage
- Image tag, e.g.
0.3.0
The image can be downloaded by using the standard Docker workflow:
# Login
docker login https://speechmaticspublic.azurecr.io