Translation
Transcription:BatchReal-TimeDeployments:AllSpeechmatics enables you to translate your audio into multiple languages. Quickly add translation to your application through a single API call, with over 30 languages supported.
Test out our translation feature for free in the Speechmatics Portal, no coding required.
Translation can be enabled when transcribing either a file or in real-time, by using the Speechmatics SaaS, or by deploying it on-prem.
If you're new to Speechmatics, please see our guide on Transcribing a File or Transcribing in Real-Time. Once you are set up, include the following config to enable translation:
{
"type": "transcription",
"transcription_config": {
"operating_point": "enhanced",
"language": "en"
},
"translation_config": {
"target_languages": ["es", "de"] # Set languages here to enable translation
}
}
Quick Start
- Batch Translation
- Real-Time Translation
1from speechmatics.models import ConnectionSettings
2from speechmatics.batch_client import BatchClient
3from httpx import HTTPStatusError
4
5API_KEY = "YOUR_API_KEY"
6PATH_TO_FILE = "example.wav"
7LANGUAGE = "en" # Transcription language
8TRANSLATION_LANGUAGES = ["es","de"]
9
10settings = ConnectionSettings(
11 url="https://asr.api.speechmatics.com/v2",
12 auth_token=API_KEY,
13)
14
15# Define transcription parameters
16conf = {
17 "type": "transcription",
18 "transcription_config": {
19 "language": LANGUAGE
20 },
21 "translation_config": {
22 "target_languages":TRANSLATION_LANGUAGES
23 }
24}
25
26# Open the client using a context manager
27with BatchClient(settings) as client:
28 try:
29 job_id = client.submit_job(
30 audio=PATH_TO_FILE,
31 transcription_config=conf,
32 )
33 print(f'job {job_id} submitted successfully, waiting for transcript')
34
35 # Note that in production, you should set up notifications instead of polling.
36 # Notifications are described here: https://docs.speechmatics.com/features-other/notifications
37 transcript = client.wait_for_completion(job_id, transcription_format='json-v2')
38 for language in TRANSLATION_LANGUAGES:
39 # Print the translation for each language from the JSON
40 print(f"Translation for {language}")
41 translation = ""
42 for translated_segment in transcript["translations"][language]:
43 translation += translated_segment["content"] + " "
44 print(translation)
45 except HTTPStatusError as e:
46 if e.response.status_code == 401:
47 print('Invalid API key - Check your API_KEY at the top of the code!')
48 elif e.response.status_code == 400:
49 print(e.response.json()['detail'])
50 else:
51 raise e
52
Python client example to translate a file in real-time, see here for more examples of Real-Time Transcription
1import speechmatics
2from httpx import HTTPStatusError
3
4API_KEY = "YOUR_API_KEY"
5PATH_TO_FILE = "example.wav"
6LANGUAGE = "en" # Transcription language
7TRANSLATION_LANGUAGES = ["es","de"]
8CONNECTION_URL = f"wss://eu2.rt.speechmatics.com/v2/{LANGUAGE}"
9
10# Create a transcription client
11ws = speechmatics.client.WebsocketClient(
12 speechmatics.models.ConnectionSettings(
13 url=CONNECTION_URL,
14 auth_token=API_KEY,
15 generate_temp_token=True, # Enterprise customers don't need to provide this parameter
16 )
17)
18
19# Define an event handler to print the translations
20def print_translation(msg):
21 msg_type="Final"
22 if msg['message'] == "AddPartialTranslation":
23 msg_type="Partial"
24
25 language = msg['language'] # language for translation message
26 translations = []
27 for translation_segment in msg['results']:
28 translations.append(translation_segment['content'])
29
30 translation = " ".join(translations).strip()
31 print(f"{msg_type} translation for {language}: {translation}")
32
33# Register the event handler for partial translation
34ws.add_event_handler(
35 event_name=speechmatics.models.ServerMessageType.AddPartialTranslation,
36 event_handler=print_translation,
37)
38
39# Register the event handler for full translation
40ws.add_event_handler(
41 event_name=speechmatics.models.ServerMessageType.AddTranslation,
42 event_handler=print_translation,
43)
44
45settings = speechmatics.models.AudioSettings()
46
47# Define transcription parameters with translation
48# Full list of parameters described here: https://speechmatics.github.io/speechmatics-python/models
49
50translation_config = speechmatics.models.RTTranslationConfig(
51 target_languages=TRANSLATION_LANGUAGES,
52 #enable_partials=True # Optional argument to provide translation of partial sentences
53)
54
55transcription_config = speechmatics.models.TranscriptionConfig(
56 language=LANGUAGE,
57 translation_config=translation_config
58)
59
60print("Starting transcription (type Ctrl-C to stop):")
61with open(PATH_TO_FILE, 'rb') as fd:
62 try:
63 ws.run_synchronously(fd, transcription_config, settings)
64 except KeyboardInterrupt:
65 print("\nTranscription stopped.")
66 except HTTPStatusError as e:
67 if e.response.status_code == 401:
68 print('Invalid API key - Check your API_KEY at the top of the code!')
69 else:
70 raise e
71
Maximum number of translations: Each transcription can have up to five translations configured.
Translation Response
- Batch Translation
- Real-Time Translation
{
"format": "2.9",
"job": {
"created_at": "2023-01-23T19:31:19.354Z",
"data_name": "example.wav",
"duration": 15,
"id": "ggqjaazkqf"
},
"metadata": {
"created_at": "2023-01-23T19:31:44.766Z",
"type": "transcription",
"transcription_config": {
"language": "en",
"diarization": "speaker"
},
"translation_config": {
"target_languages": [
"es"
]
}
},
"results": [
{
"start_time": 0.78,
"end_time": 1.32,
"type": "word",
"alternatives": [
{
"content": "Welcome",
"confidence": 1.0,
"language": "en",
"speaker": "S1"
}
]
},
...
],
"translations": {
"es": [
{
"start_time": 0.78,
"end_time": 2.58,
"content": "Bienvenidos a Speechmatics.",
"speaker": "S1"
},
{
"start_time": 3.0,
"end_time": 7.94,
"content": "Esperamos que tengas un gran día.",
"speaker": "S1"
},
...
]
}
}
Real-time provides a stream of translation messages, per language requested. Translation messages will arrive after transcription messages, but won't delay transcription.
As per transcription there are two types of messages, Partial translations (optional) and Final translations.
Final Translation
A Final translation, is a final best prediction for the translation (usually the end of a sentence). Once output, these translations are considered Final and will not be updated afterwards.
Final translation example message
{
"format": "2.9",
"message": "AddTranslation",
"language": "es",
"results": [
{
"start_time": 5.45999987795949,
"end_time": 6.189999870583415,
"content": "Bienvenidos a Speechmatics.",
"speaker": "S1"
}
]
}
Partial Translation (optional)
A Partial translation, is a translation that can be updated at a later point in time as more context becomes available.
By default, only Final translations are produced. Partials must be explicitly enabled using the enable_partials
property in translation_config
for the session.
Partial translations are translations that can be updated, and often correspond to unfinished sentences. They have a lower latency than translation finals.
Partial translation example message
{
"format": "2.9",
"message": "AddPartialTranslation",
"language": "es",
"results": [
{
"start_time": 5.45999987795949,
"end_time": 5.889999870583415,
"content": "Bienvenidos a",
"speaker": "S1"
}
]
}
Each translated section of text matches directly to one or more words in the native language transcription based on the start and end time. Each one has the following properties:
language
: Real-time only. The translated language ISO Language Codecontent
: The translated contentstart_time
: The start time of the translated content, which matches the start time of the first word in the transcriptend_time
: The end time of the translated content, which matches the end time of the last word in the transcriptspeaker
: The speaker label whendiarization:speaker
is set - see here about Speaker Diarization- Speaker labels are only available on Final translations, not Partial translations
channel
: The channel label whendiarization:channel
is set - see here about Channel Diarization- Only available for Batch transcription
Language Pairs Supported
Translation is supported for the majority of Speechmatics' languages. The supported translation pairs are listed below.
Audio Language | Translation Target Language |
---|---|
English (en) | Bulgarian (bg), Catalan (ca), Mandarin (cmn), Czech (cs), Danish (da), German (de), Greek (el), Spanish (es), Estonian (et), Finnish (fi), French (fr), Galician (gl), Hindi (hi), Croatian (hr), Hungarian (hu), Indonesian (id), Italian (it), Japanese (ja), Korean (ko), Lithuanian (lt), Latvian (lv), Malay (ms), Dutch (nl), Norwegian (no), Polish (pl), Portuguese (pt), Romanian (ro), Russian (ru), Slovakian (sk), Slovenian (sl), Swedish (sv), Turkish (tr), Ukrainian (uk), Vietnamese (vi) |
Bulgarian (bg), Catalan (ca), Mandarin (cmn), Czech (cs), Danish (da), German (de), Greek (el), Spanish (es), Estonian (et), Finnish (fi), French (fr), Galician (gl), Hindi (hi), Croatian (hr), Hungarian (hu), Indonesian (id), Italian (it), Japanese (ja), Korean (ko), Lithuanian (lt), Latvian (lv), Malay (ms), Dutch (nl), Norwegian (no), Polish (pl), Portuguese (pt), Romanian (ro), Russian (ru), Slovakian (sk), Slovenian (sl), Swedish (sv), Turkish (tr), Ukrainian (uk), Vietnamese (vi) | English (en) |
Norwegian Bokmål (no)* | Norwegian Nynorsk (nn) |
*Norwgian Bokmål to Nynorsk is only supported on Batch SaaS.
Currently unsupported Speechmatics languages: Arabic, Bashkir, Belarusian, Welsh, Esperanto, Basque, Interlingua, Mongolian, Marathi, Tamil, Thai, Uyghur, Cantonese.
Considerations
When using translation, there are a few things to keep in mind:
- Accuracy of Transcription: We recommended using the
Enhanced
operating point for the best translation results. Transcription accuracy directly impacts the accuracy of translation - Punctuation: Punctuation plays a significant role in the accuracy of translation. It is therefore recommended to avoid disabling any Punctuation Marks or reducing the Punctuation Sensitivity to ensure the best possible results
- Formatting: The translation is applied to the written form transcript
- Transcription Time: Enabling translation for a file will increase the turnaround time of jobs. The amount of time it increases for a single translation will be small. The number of translation target languages directly affects turnaround time. This does not affect Real-time, except when closing a connection when there can be a delay of up to 5 seconds while we receive the final translation
Limitations
- Maximum Number of Translations: Each transcription can have up to five translations
- Output Formats: At this time, only the JSON transcript format is supported for translation. Text and SRT transcript formats are only available in the native language for Batch transcription
- Other Transcription Features: The following transcription features are only available in the native language transcript, not the translation
- Single word timings
- Confidence scores
- Word tagging
- Output locale spelling
Batch Error Responses
Unsupported Target Language
If one or more of the target languages are not supported for the source language, a HTTP 400 error response is returned.
Example Bad Config:
{
"type": "transcription",
"transcription_config": {
"language": "en"
},
"translation_config": {
"target_languages": ["es", "zz"]
}
}
Response:
{
"code": 400,
"detail": "Job config JSON is invalid. Error: language zz is not a supported translation target for source language en",
"error": "Job rejected"
}
Too Many Target Languages
Each transcription can have up to five translations. If you request more than five a HTTP 400 error response is returned.
{
"code": 400,
"detail": "maximum number of target languages is 5 and requested count is 6",
"error": "Job rejected"
}
Translation Failure
In the event that Translation fails for one or more languages, the transcription process will complete but the Translation will not be returned. An error message will be included in the final JSON output.
{
"job": { ... },
"metadata": {
"created_at": "2023-05-26T15:01:48.412714Z",
"type": "transcription",
"transcription_config": {...},
"translation_config": {...},
"translation_errors": [
{"type": "translation_failed", "message": "Translation failed."}
],
...
},
"results": [...]
}
Real-Time Error Responses
Unsupported Target Language
If one or more of the target languages are not supported for the source language, a warning is raised but transcription will continue. No translations will be returned for that language.
WARNING:speechmatics.client:Translation from en to ta currently not supported
Too Many Target Languages
Each transcription can have up to five translations. If you request more than five a error is returned and the websocket connection will fail.
ERROR:speechmatics.exceptions.TranscriptionError: translation_config.target_languages should have at most 5 items.
Feedback
Do you have requests or feedback on translation? If so, please send us your thoughts via our Translation Feedback Form.
If you want to report an issue or get Support more urgently Raise an Issue instead.