Speech to Text

Languages

Information about the wide array of languages Speechmatics supports transcription for

This page lists the range of languages supported by Speechmatics.

To automatically identify the language in an audio file, use our Language Identification feature.

To dynamically update your system with the latest languages and features offered by Speechmatics, use our Feature Discovery endpoint.

Speechmatics supports the following languages. Your ability to use any or all of the languages will depend on what languages you are contracted to use.

Speechmatics takes a global-first approach to our languages. In a single language pack, we aim to support many different accents and dialects. This simplifies your workflow when selecting which language to use, not requiring you to know which accent is being spoken in your audio upfront. With this approach we still achieve very high accuracy compared to accent-specific language packs.

Language	Language Code	Description
Automatic	auto	Automatically detect the language using our Language Identification feature. Please note, this is currently only supported with Batch Transcriptions.
Arabic	ar	Our global Arabic gives high-accuracy transcription across many different accents and dialects including (but not limited to) Modern Standard Arabic (MSA) and Arabic spoken in the Gulf, Egypt and the Levant.
Bashkir	ba
Basque	eu
Belarusian	be
Bengali	bn
Bulgarian	bg
Cantonese	yue
Catalan	ca
Croatian	hr
Czech	cs
Danish	da
Dutch	nl
English	en	Our global English gives high-accuracy transcription across many different accents including (but not limited to) English spoken in the United Kingdom, United States, Australia, New Zealand and non-native speakers. To standardise spelling, we recommend specifying the Output Locale.
Esperanto	eo
Estonian	et
Finnish	fi
French	fr	Our global French gives high-accuracy transcription across many different accents including (but not limited to) French spoken in France, Canada and Belgium.
Galician	gl
German	de	Our global German gives high-accuracy transcription across many different accents including (but not limited to) German spoken in Germany, Austria and Switzerland.
Greek	el
Hebrew	he
Hindi	hi
Hungarian	hu
Indonesian	id
Interlingua	ia
Irish	ga
Italian	it
Japanese	ja
Korean	ko
Latvian	lv
Lithuanian	lt
Malay	ms
Malay & English bilingual	en_ms	Ideal when transcribing Malay and English in the same media file or stream. Supports all accents and dialects listed under Malay and English.
Maltese	mt
Mandarin	cmn	Our global Mandarin can output Traditional or Simplified characters and gives high accuracy transcription across many different accents including (but not limited to) China, Taiwan, Singapore, Malaysia.
Mandarin & English bilingual	cmn_en	Ideal when transcribing Mandarin and English in the same media file or stream. Supports all accents and dialects listed under Mandarin and English.
Marathi	mr
Mongolian	mn
Norwegian	no
Persian	fa
Polish	pl
Portuguese	pt	Our global Portuguese gives high-accuracy transcription across many different accents including (but not limited to) Portuguese spoken in Portugal and Brazil.
Romanian	ro
Russian	ru
Slovakian	sk
Slovenian	sl
Spanish	es	Our global Spanish gives high-accuracy transcription across many different accents including (but not limited to) Spanish spoken in Spain, US, Mexico, Colombia, Argentina, Venezuela, Chile and Peru.
Spanish & English bilingual	es (with domain='bilingual-en')	Ideal when transcribing Spanish and English in the same media file or stream. Supports all accents and dialects listed under English and Spanish. Requires the domain config to be set.
Swahili	sw
Swedish	sv
Tamil	ta
Tamil & English bilingual	en_ta	Ideal when transcribing Tamil and English in the same media file or stream. Supports all accents and dialects listed under Tamil and English.
Thai	th
Turkish	tr
Ukrainian	uk
Urdu	ur
Uyghur	ug
Vietnamese	vi
Welsh	cy	Welsh must be explicitly added to the expected languages list when using our Language Identification feature, otherwise a language not supported for transcription error will be returned.

Each language above is uniquely identified by a two-letter code (ISO639-1) or three-letter code (ISO639-3) in API requests and responses.

Translation Languages

Translation is supported for the majority of Speechmatics' languages. The supported translation pairs are listed below. For more details, see Translation.

Audio Language	Translation Target Language
English (en)	Bulgarian (bg), Catalan (ca), Mandarin (cmn), Czech (cs), Danish (da), German (de), Greek (el), Spanish (es), Estonian (et), Finnish (fi), French (fr), Galician (gl), Hindi (hi), Croatian (hr), Hungarian (hu), Indonesian (id), Italian (it), Japanese (ja), Korean (ko), Lithuanian (lt), Latvian (lv), Malay (ms), Dutch (nl), Norwegian (no), Polish (pl), Portuguese (pt), Romanian (ro), Russian (ru), Slovakian (sk), Slovenian (sl), Swedish (sv), Turkish (tr), Ukrainian (uk), Vietnamese (vi)
Bulgarian (bg), Catalan (ca), Mandarin (cmn), Czech (cs), Danish (da), German (de), Greek (el), Spanish (es), Estonian (et), Finnish (fi), French (fr), Galician (gl), Hindi (hi), Croatian (hr), Hungarian (hu), Indonesian (id), Italian (it), Japanese (ja), Korean (ko), Lithuanian (lt), Latvian (lv), Malay (ms), Dutch (nl), Norwegian (no), Polish (pl), Portuguese (pt), Romanian (ro), Russian (ru), Slovakian (sk), Slovenian (sl), Swedish (sv), Turkish (tr), Ukrainian (uk), Vietnamese (vi)	English (en)
Norwegian Bokmål (no)	Norwegian Nynorsk (nn)

Multilingual speech-to-text

These packs are ideal when transcribing multiple languages in the same media file or stream with high accuracy. For more information on the supported languages, please refer to Supported Language Packs.

Supported multilingual packs are:

Language Pack	Transcription config
Mandarin and English	`{"language": "cmn_en"}`
Malay and English	`{"language": "en_ms"}`
Tamil and English	`{"language": "en_ta"}`
Spanish and English	`{"language": "es", "domain": "bilingual-en"}`

Bilingual (excluding Spanish and English) example:

{
  "type": "transcription",
  "transcription_config": {
    "language": "cmn_en",
  }
}

Bilingual Spanish and English example:

{
  "type": "transcription",
  "transcription_config": {
    "language": "es",
    "domain": "bilingual-en"
  }
}

Translation Languages​

Multilingual speech-to-text​

Translation Languages

Multilingual speech-to-text