Skip to main content
Speech to Text

Formatting

Control how numbers, punctuation, and special text appear in your transcripts.

Output locale

Some languages have multiple spelling conventions that vary by region. To ensure consistent spelling throughout your transcript, specify an output locale:

{
"type": "transcription",
"transcription_config": {
"language": "en",
"output_locale": "en-GB"
}
}

Available English locales:

  • British English (en-GB)
  • US English (en-US)
  • Australian English (en-AU)

Available Chinese Mandarin locales:

  • Simplified Mandarin (cmn-Hans, default)
  • Traditional Mandarin (cmn-Hant)

Recommended for English transcription. Without a specified locale, spelling may be inconsistent within the same transcript.

Profanities

You can tag profanities to identify or censor offensive language in your workflow. Profanity tagging is available for:

  • English (en)
  • Italian (it)
  • Spanish (es)

Tagged profanities appear in the transcript with the profanity tag:

"results": [
{
"alternatives": [
{
"confidence": 1.0,
"content": "$PROFANITY",
"language": "en",
"tags": [
"profanity"
]
}
],
"end_time": 18.03,
"start_time": 17.61,
"type": "word"
}
]

For other languages, consider using word replacement to identify profanities.

Disfluencies

Disfluencies are hesitation sounds like "um", "uh", and "hmm". In English, these are automatically tagged with disfluency:

"results": [
{
"alternatives": [
{
"confidence": 1.0,
"content": "hmm",
"language": "en",
"tags": [
"disfluency"
]
}
],
"end_time": 18.03,
"start_time": 17.61,
"type": "word"
}
]
Full list of tagged disfluencies
huh
aha
ah
aw
eh
err
hmm
mm
um
uh
uh-oh
uh-huh
uh-uh
mhm
a-ha
aah
aahh
aaw
ah-ha
ahaa
ahh
ahha
aww
eeh
erm
hhm
hhmm
hm
huh-uh
m-hm
uggh
ugh
ughh
uhh
uhhm
uhm
uhmm
umm
uuh
uuhh
uum

Removing disfluencies

You can automatically remove disfluencies from your transcript:

"transcription_config": {
"language": "en",
"transcript_filtering_config": {
"remove_disfluencies": true
}
}

This simplifies client-side processing by removing hesitation sounds and properly adjusting capitalization and spacing. For example:

Without disfluency removal:

Um, what would you like, hmm?

With disfluency removal:

What would you like?

This feature is available for English only. The default setting is "remove_disfluencies": false.

Word replacement

Word replacement lets you substitute specific words or patterns in the transcript after processing:

"transcription_config": {
"language": "en",
"transcript_filtering_config": {
"replacements": [
{"from": "foo", "to": "bar"},
{"from": "heavy", "to": "light"}
]
}
}

Common uses for word replacement:

  • Censoring profanities in languages without built-in support
  • Masking sensitive information (card numbers, personal data)
  • Standardizing terminology or brand names
  • Fixing known issues with particular words

Word replacement is case-sensitive and applied after transcription is complete. For example, "Foo" would not be replaced by "bar" in the example above.

For adding new vocabulary, use the custom dictionary feature instead.

Regex

You can use regular expressions (ECMAScript format) in the from field by adding forward-slash delimiters:

// Replace both "Hello" and "hello" with "goodbye"
{"from": "/^[hH]ello$/", "to": "goodbye"}

// Add brackets around "cheese" while preserving the original word
{"from": "/(cheese)/", "to": "[$1]"}

Word replacement rules:

  1. Plain word replacements are processed first
  2. If no match is found, regex replacements are tried in the order listed
  3. Once a word matches a replacement, no further replacements are applied to it
  4. Regex replacements are global (all matches are replaced)
  5. Malformed regex patterns will cause the transcription to fail with an error

Smart formatting

Speechmatics automatically converts spoken numbers, dates, currencies, and other entities into properly formatted text. This makes transcripts more readable without losing timing information.

For example, spoken words like "nineteen ninety nine" become "1999" in the output.

Configuration

To include detailed information about entities in your JSON output, add this to your configuration:

{
"type": "transcription",
"transcription_config": {
"language": "en",
"enable_entities": true
}
}

By default, enable_entities is false. When enabled, entity metadata appears only in JSON output (SRT and TXT formats remain unchanged).

Output

The JSON output will include:

  • A new type field with value entity for formatted numeric entities
  • Full written form in the content section, including any spaces or symbols
  • An entity_class field describing how the entity was formatted
  • Start and end times spanning all words in the entity
  • Two additional representations:
    • spoken_form: Original words as spoken, with individual timing and confidence
    • written_form: Formatted words separated individually

Here's an example of a transcript with enable_entities set to true:

{
"results": [
{
"alternatives": [
{
"confidence": 0.99,
"content": "17th of January 2022",
"language": "en",
"speaker": "UU"
}
],
"end_time": 3.14,
"entity_class": "date",
"spoken_form": [
{
"alternatives": [
{
"confidence": 1.0,
"content": "seventeenth",
"language": "en",
"speaker": "UU"
}
],
"end_time": 1.41,
"start_time": 0.72,
"type": "word"
},
// Additional spoken words omitted for brevity
],
"start_time": 0.72,
"type": "entity",
"written_form": [
{
"alternatives": [
{
"confidence": 0.99,
"content": "17th",
"language": "en",
"speaker": "UU"
}
],
"end_time": 1.33,
"start_time": 0.72,
"type": "word"
},
// Additional written words omitted for brevity
]
}
]
}

When enable_entities is false, the words appear individually in the output.

Entity classes

The system applies different formatting rules based on the type of entity detected. The following classes are available:

Entity classDescriptionSpoken exampleWritten example
alphanumAlphanumeric sequences (3+ characters)"a z triple seven five four"AZ77754
cardinalWhole numbers (in English, numbers ≤10 remain as words)"nineteen"19
decimalNumbers with decimal point"eighteen point one two"18.12
fractionFractions (complex ones use n/d format)"three sixteenths"3/16
ordinalPosition numbers with suffix"forty second"42nd
moneyCurrency values with symbol"twenty dollars"$20
percentagePercentages with % symbol"two hundred percent"200%
dateCalendar dates and years"fifteenth of January twenty twenty two"15th of January 2022
timeClock times with separators"eleven forty a m"11:40 a.m.
spanRanges (x to y format)"one hundred to two hundred million pounds"100 to £200 million
credit cardPayment card number sequences"one one one one..."1111 2222 3333 4444
telephonePhone number formatting"five five five..."(555) 429-2228
electronicEmail and web addresses"bob at speechmatics dot com"bob@speechmatics.com
measurementUnits with abbreviations"ten kilometers per second"10 km/s

The system chooses entity classes based on context, so occasionally a value might be classified differently than expected. For example, "2001" could be a "cardinal" number or a "date".

Languages

Each language follows its own conventions for:

  • Thousand separators
  • Decimal separators
  • Currency symbol position

Examples:

  • English: Uses commas for thousands (20,000), decimal points (10.5), and places currency symbols before values ($10)
  • German: Uses periods for thousands (20.000), commas for decimals (10,5), and places currency symbols after values with a non-breaking space (10 $)
  • French: Uses non-breaking spaces for thousands (20 000), commas for decimals (10,5), and places currency symbols after values with a non-breaking space (10 $)

Smart formatting is available in these languages:

  • Cantonese
  • Chinese Mandarin (Simplified and Traditional)
  • Dutch
  • English
  • French
  • German
  • Hindi
  • Italian
  • Japanese
  • Norwegian
  • Portuguese
  • Russian
  • Spanish
  • Swedish

Punctuation

All Speechmatics language packs support punctuation to improve transcript readability. Each language supports specific punctuation marks:

LanguageSupported marksEnd-of-sentence marksNotes
Cantonese, Mandarin, 。 ? ! 、。 ? !Full-width punctuation
Japanese。 、Full-width punctuation
Hindi। ? , !। ? !
All other languages. , ! ?. ! ?

Configuration

You can control which punctuation marks appear in your transcripts using the punctuation_overrides setting:

"transcription_config": {
"language": "en",
"punctuation_overrides": {
"permitted_marks": [".", ","],
"sensitivity": 0.4
}
}

This configuration:

  • Allows only periods and commas (no question or exclamation marks)
  • Sets punctuation sensitivity to 0.4 (lower than the default 0.5)

The sensitivity parameter accepts values from 0 to 1. Higher values produce more punctuation in the output.

Disabling punctuation may slightly reduce speaker diarization accuracy. See the speaker diarization and punctuation section for details.

Next steps

  • Custom Dictionary: Improve recognition of specific words and phrases by adding them to a custom dictionary.
  • Diarization: Enhance your transcripts with speaker and channel information.