Formatting
Control how numbers, punctuation, and special text appear in your transcripts.Output locale
Some languages have multiple spelling conventions that vary by region. To ensure consistent spelling throughout your transcript, specify an output locale:
{
"type": "transcription",
"transcription_config": {
"language": "en",
"output_locale": "en-GB"
}
}
Available English locales:
- British English (
en-GB
) - US English (
en-US
) - Australian English (
en-AU
)
Available Chinese Mandarin locales:
- Simplified Mandarin (
cmn-Hans
, default) - Traditional Mandarin (
cmn-Hant
)
Recommended for English transcription. Without a specified locale, spelling may be inconsistent within the same transcript.
Profanities
You can tag profanities to identify or censor offensive language in your workflow. Profanity tagging is available for:
- English (
en
) - Italian (
it
) - Spanish (
es
)
Tagged profanities appear in the transcript with the profanity
tag:
"results": [
{
"alternatives": [
{
"confidence": 1.0,
"content": "$PROFANITY",
"language": "en",
"tags": [
"profanity"
]
}
],
"end_time": 18.03,
"start_time": 17.61,
"type": "word"
}
]
For other languages, consider using word replacement to identify profanities.
Disfluencies
Disfluencies are hesitation sounds like "um", "uh", and "hmm". In English, these are automatically tagged with disfluency
:
"results": [
{
"alternatives": [
{
"confidence": 1.0,
"content": "hmm",
"language": "en",
"tags": [
"disfluency"
]
}
],
"end_time": 18.03,
"start_time": 17.61,
"type": "word"
}
]
Full list of tagged disfluencies
huh
aha
ah
aw
eh
err
hmm
mm
um
uh
uh-oh
uh-huh
uh-uh
mhm
a-ha
aah
aahh
aaw
ah-ha
ahaa
ahh
ahha
aww
eeh
erm
hhm
hhmm
hm
huh-uh
m-hm
uggh
ugh
ughh
uhh
uhhm
uhm
uhmm
umm
uuh
uuhh
uum
Removing disfluencies
You can automatically remove disfluencies from your transcript:
"transcription_config": {
"language": "en",
"transcript_filtering_config": {
"remove_disfluencies": true
}
}
This simplifies client-side processing by removing hesitation sounds and properly adjusting capitalization and spacing. For example:
Without disfluency removal:
Um, what would you like, hmm?
With disfluency removal:
What would you like?
This feature is available for English only. The default setting is "remove_disfluencies": false
.
Word replacement
Word replacement lets you substitute specific words or patterns in the transcript after processing:
"transcription_config": {
"language": "en",
"transcript_filtering_config": {
"replacements": [
{"from": "foo", "to": "bar"},
{"from": "heavy", "to": "light"}
]
}
}
Common uses for word replacement:
- Censoring profanities in languages without built-in support
- Masking sensitive information (card numbers, personal data)
- Standardizing terminology or brand names
- Fixing known issues with particular words
Word replacement is case-sensitive and applied after transcription is complete. For example, "Foo" would not be replaced by "bar" in the example above.
For adding new vocabulary, use the custom dictionary feature instead.
Regex
You can use regular expressions (ECMAScript format) in the from
field by adding forward-slash delimiters:
// Replace both "Hello" and "hello" with "goodbye"
{"from": "/^[hH]ello$/", "to": "goodbye"}
// Add brackets around "cheese" while preserving the original word
{"from": "/(cheese)/", "to": "[$1]"}
Word replacement rules:
- Plain word replacements are processed first
- If no match is found, regex replacements are tried in the order listed
- Once a word matches a replacement, no further replacements are applied to it
- Regex replacements are global (all matches are replaced)
- Malformed regex patterns will cause the transcription to fail with an error
Smart formatting
Speechmatics automatically converts spoken numbers, dates, currencies, and other entities into properly formatted text. This makes transcripts more readable without losing timing information.
For example, spoken words like "nineteen ninety nine" become "1999" in the output.
Configuration
To include detailed information about entities in your JSON output, add this to your configuration:
{
"type": "transcription",
"transcription_config": {
"language": "en",
"enable_entities": true
}
}
By default, enable_entities
is false
. When enabled, entity metadata appears only in JSON output (SRT and TXT formats remain unchanged).
Output
The JSON output will include:
- A new
type
field with valueentity
for formatted numeric entities - Full written form in the
content
section, including any spaces or symbols - An
entity_class
field describing how the entity was formatted - Start and end times spanning all words in the entity
- Two additional representations:
spoken_form
: Original words as spoken, with individual timing and confidencewritten_form
: Formatted words separated individually
Here's an example of a transcript with enable_entities
set to true
:
{
"results": [
{
"alternatives": [
{
"confidence": 0.99,
"content": "17th of January 2022",
"language": "en",
"speaker": "UU"
}
],
"end_time": 3.14,
"entity_class": "date",
"spoken_form": [
{
"alternatives": [
{
"confidence": 1.0,
"content": "seventeenth",
"language": "en",
"speaker": "UU"
}
],
"end_time": 1.41,
"start_time": 0.72,
"type": "word"
},
// Additional spoken words omitted for brevity
],
"start_time": 0.72,
"type": "entity",
"written_form": [
{
"alternatives": [
{
"confidence": 0.99,
"content": "17th",
"language": "en",
"speaker": "UU"
}
],
"end_time": 1.33,
"start_time": 0.72,
"type": "word"
},
// Additional written words omitted for brevity
]
}
]
}
When enable_entities
is false
, the words appear individually in the output.
Entity classes
The system applies different formatting rules based on the type of entity detected. The following classes are available:
Entity class | Description | Spoken example | Written example |
---|---|---|---|
alphanum | Alphanumeric sequences (3+ characters) | "a z triple seven five four" | AZ77754 |
cardinal | Whole numbers (in English, numbers ≤10 remain as words) | "nineteen" | 19 |
decimal | Numbers with decimal point | "eighteen point one two" | 18.12 |
fraction | Fractions (complex ones use n/d format) | "three sixteenths" | 3/16 |
ordinal | Position numbers with suffix | "forty second" | 42nd |
money | Currency values with symbol | "twenty dollars" | $20 |
percentage | Percentages with % symbol | "two hundred percent" | 200% |
date | Calendar dates and years | "fifteenth of January twenty twenty two" | 15th of January 2022 |
time | Clock times with separators | "eleven forty a m" | 11:40 a.m. |
span | Ranges (x to y format) | "one hundred to two hundred million pounds" | 100 to £200 million |
credit card | Payment card number sequences | "one one one one..." | 1111 2222 3333 4444 |
telephone | Phone number formatting | "five five five..." | (555) 429-2228 |
electronic | Email and web addresses | "bob at speechmatics dot com" | bob@speechmatics.com |
measurement | Units with abbreviations | "ten kilometers per second" | 10 km/s |
The system chooses entity classes based on context, so occasionally a value might be classified differently than expected. For example, "2001" could be a "cardinal" number or a "date".
Languages
Each language follows its own conventions for:
- Thousand separators
- Decimal separators
- Currency symbol position
Examples:
- English: Uses commas for thousands (20,000), decimal points (10.5), and places currency symbols before values ($10)
- German: Uses periods for thousands (20.000), commas for decimals (10,5), and places currency symbols after values with a non-breaking space (10 $)
- French: Uses non-breaking spaces for thousands (20 000), commas for decimals (10,5), and places currency symbols after values with a non-breaking space (10 $)
Smart formatting is available in these languages:
- Cantonese
- Chinese Mandarin (Simplified and Traditional)
- Dutch
- English
- French
- German
- Hindi
- Italian
- Japanese
- Norwegian
- Portuguese
- Russian
- Spanish
- Swedish
Punctuation
All Speechmatics language packs support punctuation to improve transcript readability. Each language supports specific punctuation marks:
Language | Supported marks | End-of-sentence marks | Notes |
---|---|---|---|
Cantonese, Mandarin | , 。 ? ! 、 | 。 ? ! | Full-width punctuation |
Japanese | 。 、 | 。 | Full-width punctuation |
Hindi | । ? , ! | । ? ! | |
All other languages | . , ! ? | . ! ? |
Configuration
You can control which punctuation marks appear in your transcripts using the punctuation_overrides
setting:
"transcription_config": {
"language": "en",
"punctuation_overrides": {
"permitted_marks": [".", ","],
"sensitivity": 0.4
}
}
This configuration:
- Allows only periods and commas (no question or exclamation marks)
- Sets punctuation sensitivity to 0.4 (lower than the default 0.5)
The sensitivity
parameter accepts values from 0 to 1. Higher values produce more punctuation in the output.
Disabling punctuation may slightly reduce speaker diarization accuracy. See the speaker diarization and punctuation section for details.
Next steps
- Custom Dictionary: Improve recognition of specific words and phrases by adding them to a custom dictionary.
- Diarization: Enhance your transcripts with speaker and channel information.