Skip to main content

Automatic Usage Reporting

Transcription:BatchReal-TimeDeployments:ContainerStatus:Beta

Compatibility

To enable automatic usage reporting, you must be running one of the following ASR Container versions:

  • Batch Container 10.1.0 onwards
  • Real-Time Container 10.1.0 onwards

Introduction

The most convenient way of reporting usage to Speechmatics is by enabling Automatic Usage Reporting. Once this is enabled, the transcriber will automatically connect to Speechmatics servers to send required usage analytics.

This feature works by sending periodic HTTPS requests to Speechmatics over the course of a transcription session. Information recorded includes the job configuration, the duration of transcription, and the amount of audio being transcribed. We aim to be completely transparent about exactly What Data We Record.

This feature is turned OFF by default and is currently opt in. It is turned on by setting the environment variable SM_ENABLE_USAGE_REPORTING=true (true, yes or 1 are equally valid) when running the transcriber. For example:

docker run -i -v ~/$AUDIO_FILE:/input.audio \
	-e LICENSE_TOKEN=eyJhbGciOiJ... \
	-e SM_ENABLE_USAGE_REPORTING=true \
	batch-asr-transcriber-en:10.5.1
info

We will never send customer audio data over the network. See What Data Do We Record for a full description of what information will be recorded.

Technical Details

The Batch transcriber will report one TRANSCRIBER_DONE event at the event of transcription.

The real-time transcriber will report one SESSION_ENDED event at the end of each session. During a session, the real-time transcriber also sends SESSION_STATUS every few minutes.

The payload size is only several KB, so it won’t have a meaningful impact on the duration of transcription or your bandwidth costs.

If usage reporting is successful then at the end of the session the following message will be visible in the transcriber logs:

2022-12-01 13:55:24.332 INFO sentryserver Usage reported to Speechmatics

Network Failure

In the event of a network failure (for example, if your Internet connection is down or our usage server has a temporary outage) the transcriber will attempt to reconnect to our usage server several times.

2022-12-01 13:53:55.918 ERROR sentryserver Error 'Post "https://usage.speechmatics.com/v1/log": dial tcp: lookup usage.speechmatics.com on 192.168.4.129:53: no such host' occurred when logging EATS data: retrying
2022-12-01 13:53:56.475 ERROR sentryserver Error 'Post "https://usage.speechmatics.com/v1/log": dial tcp: lookup usage.speechmatics.com on 192.168.4.129:53: no such host' occurred when logging EATS data: retrying
2022-12-01 13:53:57.561 ERROR sentryserver Error 'Post "https://usage.speechmatics.com/v1/log": dial tcp: lookup usage.speechmatics.com on 192.168.4.129:53: no such host' occurred when logging EATS data: retrying

If, after this retry period (which takes up to 10 seconds), the transcriber is still unable to contact our usage server then it will output some WARNING log messages then cease attempting to send usage information. If this happens then the transcriber will exit normally with an exit code of 0.

2022-12-02 13:26:29.962 WARNING sentryserver SM Usage Reporting: Error handling item, current retry count 1
2022-12-02 13:26:29.962 WARNING sentryserver SM Usage Reporting: deactivated because max retry limit reached
2022-12-02 13:26:30.963 WARNING sentryserver SM Usage Reporting: deactivated so item will be skipped

For Batch transcribers, the transcriber will exit immediately after this.

For Real-Time transcribers, usage reporting will be disabled for a fixed time period (currently 60 seconds). This is to minimize the impact on the duration of transcription jobs. This retry mechanism will cause a small hit to the speed of transcription, so in the event of a network outage, you may wish to temporarily disable usage reporting by not setting the SM_ENABLE_USAGE_REPORTING variable when running the Container.

We ask that you inform our Finance Team about the duration and timing of any such outage.