Speech to TextFeatures

Custom dictionary

Learn how to use the Speechmatics custom dictionary

The Custom dictionary feature allows a list of custom words to be added for each transcription job. This helps when a specific word is not recognised during transcription. It could be that it's not in the vocabulary for that language, for example a company or person's name. Adding custom words can improve the likelihood they will be output.

The sounds_like feature is an extension to this to allow alternative pronunciations to be specified to aid recognition when the pronunciation is not obvious.

The Custom dictionary feature can be accessed through the additional_vocab property.

Prior to using this feature, consider the following:

sounds_like is an optional setting recommended when the pronunciation is not obvious for the word or, it can be pronounced in multiple ways; it is valid just to provide the content value
sounds_like only works with the main script for that language
- Japanese (ja) sounds_like only supports full width Hiragana or Katakana
You can specify up to 1000 words or phrases (per job) in your Custom dictionary

"transcription_config": {
  "language": "en",
  "additional_vocab": [
    {
      "content": "financial crisis"
    },
    {
      "content": "gnocchi",
      "sounds_like": [
        "nyohki",
        "nokey",
        "nochi"
      ]
    },
    {
      "content": "CEO",
      "sounds_like": [
        "C.E.O."
      ]
    }
  ]
}

In the above example, the words gnocchi and CEO have pronunciations applied to them; the phrase financial crisis does not require a pronunciation. The content property represents how you want the word to be output in the transcript.

Custom dictionary caching

The Speechmatics Realtime SaaS caches custom dictionaries to reduce session initialisation times.

You will see improvements when reusing an identical custom dictionary from the second time onwards. Cache entries expire when they are not used for 24 hours.

On-prem Realtime Containers can also make use of a Shared custom dictionary cache.

Custom dictionary content limits

Any individual custom dictionary (additional_vocab) element longer than 6 words or contains a word exceeding 4000 characters will be automatically dropped from the config before transcription starts. This applies to both the content and sounds_like fields.

Realtime API
The server will send an in-band warning message of type validation_warning to the client prior to the RecognitionStarted message. See the Realtime API Reference for more details.

Example

Consider the following custom dictionary:

[{"content":"this is a element that surpasses the limit"}, {"content":"this is ok"}]

After applying the limit, becomes

[{"content":"this is ok"}]

The same limit applies for sounds_like:

[{"content":"this is ok", "sounds_like":["this is ok", "this is a element that surpasses the limit"]}]

After applying the limit, becomes

[{"content":"this is ok", "sounds_like":["this is ok"]}]

The following custom dictionary would not be affected - it contains 7 elements but each individual element is below the 6 word limit.

["apple", "orange", "banana", "lemon", "lime", "melon", "pear"]

Custom dictionary caching​

Custom dictionary content limits​

Example​

Custom dictionary caching

Custom dictionary content limits

Example