Skip to main content

Output Formatting

Transcription:BatchReal-TimeDeployments:All

Speechmatics allows you to configure the transcription output to suit the needs of your users. This could include handling of profanities, or ensuring consistent spelling of key words.

Output Locale

To make spelling rules more consistent in the transcription output, specify output_locale:

{
  "type": "transcription",
  "transcription_config": {
    "language": "en",
    "output_locale": "en-GB"
  }
}

When transcribing in English, it is recommended to specify the locale. If no locale is specified then the spelling may be inconsistent within a transcript. The three locales in English that are available are:

  • British English - en-GB
  • US English - en-US
  • Australian English - en-AU

The following locales are supported for Chinese Mandarin:

  • Simplified Mandarin (Default) - cmn-Hans
  • Traditional Mandarin - cmn-Hant

Profanities

You can use this tag in order to identify, redact, or obfuscate profanities and integrate this data into your own workflows.

Profanity tagging is available for the following languages:

  • English (en)
  • Italian (it)
  • Spanish (es)

To add more words to the profanity list, consider using Word Replacement. An example of how this looks is below.

"results": [
  {
    "alternatives": [
      {
        "confidence": 1.0,
        "content": "$PROFANITY",
        "language": "en",
        "tags": [
          "profanity"
        ]
      }
    ],
    "end_time": 18.03,
    "start_time": 17.61,
    "type": "word"
  }
]

Disfluencies

The disfluency tag is applied to a set list of words that imply hesitation or indecision such as 'hmm' or 'um'. It is available in English language only. It can be used to identify and remove these words from the transcript on the client side.

If you have no need to toggle displaying disfluencies on the client side, we recommend using Disfluency Removal for a simpler integration.

Full list of words tagged as disfluencies
huh
aha
ah
aw
eh
err
hmm
mm
um
uh
uh-oh
uh-huh
uh-uh
mhm
a-ha
aah
aahh
aaw
ah-ha
ahaa
ahh
ahha
aww
eeh
erm
hhm
hhmm
hm
huh-uh
m-hm
uggh
ugh
ughh
uhh
uhhm
uhm
uhmm
umm
uuh
uuhh
uum

An example of a word tagged as disfluency is below:

"results": [
  {
    "alternatives": [
      {
        "confidence": 1.0,
        "content": "hmm",
        "language": "en",
        "tags": [
          "disfluency"
        ]
      }
    ],
    "end_time": 18.03,
    "start_time": 17.61,
    "type": "word"
  }
]

Disfluency Removal

Transcription:BatchReal-TimeDeployments:ContainerSaaS

Disfluencies can be automatically removed from your transcript on the server side. This will simplify your client application as you will no longer need to handle disfluency tags or updating capitalisation around disfluencies. An example of output with and without disfluency removal would look something like the following:

Without disfluency removal:

Um, what would you like, hmm?

With disfluency removal:

What would you like?

You can enable Disfluency Removal as follows:

"transcription_config": {
    "language": "en",
    "transcript_filtering_config": {
        "remove_disfluencies": true
    }
}

It is available in English language only. The default value of remove_disfluencies is false.

Word Replacement

Transcription:BatchReal-TimeDeployments:SaaS

Sometimes a word is transcribed correctly, but is output in an unsuitable form. To keep the client integration simple, you can use the Word Replacement to replace words in the transcript. This could be useful for:

  • Profanities in languages without built-in profanity support
  • Secure information such as card numbers
  • Numbers (sometimes "2" is preferred to "two")
  • Proper names which are also words (may be incorrectly cased or a sound-a-like may be substituted)
  • A mistake in the language pack or localization
  • Inserting metadata around words as a form of keyword spotting
  • Changes in style guidelines (e.g. Kyiv vs. Kiev)
  • Consistency (e.g. Dr vs. Dr. vs. doctor)

For example:

"transcription_config" : {
  "language" : "en",
  "transcript_filtering_config": {
    "replacements": [
      {"from": "foo", "to": "bar"},
      {"from": "heavy", "to": "light"}
    ]
  }
}

Note that word replacement is not a substitute for a custom dictionary. If you want to add new words to the vocabulary, use the Custom Dictionary.

Word replacement is applied to a given word in the transcript after the transcription has been completed. The replacement is case-sensitive, so in the example above, "Foo" would not be replaced with "bar".

Regex support

The from field may be given as a regular expression (ECMAScript compliant) with forward-slash delimiters, e.g.

# Maps Hello and hello to goodbye 
{"from": "/^[hH]ello$/", "to": "goodbye"}

If no regex delimiters are given, the field is taken as a plain string.

Replacements support capture groups using the $N notation, e.g.

# Maps 'cheese' and 'cheesemonger' to '[cheese]' and '[cheese]monger'
{"from": "/(cheese)/": "to": "[$1]"}
  • Plain word replacements take place first.
  • If no match is found there, the regex replacements are applied in the order given in the list.
  • Once a word has matched, no more replacements are applied to that word.
  • Regex replacements are global, for example /A/ --> B will replace all the A's with B's.
  • If a regex is badly formed (for example an unmatched bracket) transcription will fail with an error.