Translation
Transcription:BatchReal-TimeDeployments:AllSpeechmatics enables you to translate your audio into multiple languages. Quickly add translation to your application through a single API call, with over 30 languages supported.
Test out our Translation Feature for free in the Speechmatics On-Demand Portal, no coding required.
Translation can be enabled when transcribing either a file or in real-time, by using the Speechmatics SaaS, or by deploying it On-Prem.
If you're new to Speechmatics, please see our guide on Transcribing a File or Transcribing in Real-Time. Once you are set up, include the following config to enable Translation:
{
"type": "transcription",
"transcription_config": {
"operating_point": "enhanced",
"language": "en"
},
"translation_config": {
"target_languages": ["es", "de"] # Set languages here to enable translation
}
}
Quick Start
- Batch Translation
- Real-Time Translation
1from speechmatics.models import ConnectionSettings
2from speechmatics.batch_client import BatchClient
3from httpx import HTTPStatusError
4
5API_KEY = "YOUR_API_KEY"
6PATH_TO_FILE = "example.wav"
7LANGUAGE = "en" # Transcription language
8TRANSLATION_LANGUAGES = ["es","de"]
9
10settings = ConnectionSettings(
11 url="https://asr.api.speechmatics.com/v2",
12 auth_token=API_KEY,
13)
14
15# Define transcription parameters
16conf = {
17 "type": "transcription",
18 "transcription_config": {
19 "language": LANGUAGE
20 },
21 "translation_config": {
22 "target_languages":TRANSLATION_LANGUAGES
23 }
24}
25
26# Open the client using a context manager
27with BatchClient(settings) as client:
28 try:
29 job_id = client.submit_job(
30 audio=PATH_TO_FILE,
31 transcription_config=conf,
32 )
33 print(f'job {job_id} submitted successfully, waiting for transcript')
34
35 # Note that in production, you should set up notifications instead of polling.
36 # Notifications are described here: https://docs.speechmatics.com/features-other/notifications
37 transcript = client.wait_for_completion(job_id, transcription_format='json-v2')
38 for language in TRANSLATION_LANGUAGES:
39 # Print the translation for each language from the JSON
40 print(f"Translation for {language}")
41 translation = ""
42 for translated_segment in transcript["translations"][language]:
43 translation += translated_segment["content"] + " "
44 print(translation)
45 except HTTPStatusError as e:
46 if e.response.status_code == 401:
47 print('Invalid API key - Check your API_KEY at the top of the code!')
48 elif e.response.status_code == 400:
49 print(e.response.json()['detail'])
50 else:
51 raise e
52
Python client example to translate a file in real-time, see here for more examples of Real-Time Transcription
1import speechmatics
2from httpx import HTTPStatusError
3
4API_KEY = "YOUR_API_KEY"
5PATH_TO_FILE = "example.wav"
6LANGUAGE = "en" # Transcription language
7TRANSLATION_LANGUAGES = ["es","de"]
8CONNECTION_URL = f"wss://eu2.rt.speechmatics.com/v2/{LANGUAGE}"
9
10# Create a transcription client
11ws = speechmatics.client.WebsocketClient(
12 speechmatics.models.ConnectionSettings(
13 url=CONNECTION_URL,
14 auth_token=API_KEY,
15 )
16)
17
18# Define an event handler to print the translations
19def print_translation(msg):
20 msg_type="Final"
21 if msg['message'] == "AddPartialTranslation":
22 msg_type="Partial"
23
24 language = msg['language'] # language for translation message
25 translations = []
26 for translation_segment in msg['results']:
27 translations.append(translation_segment['content'])
28
29 translation = " ".join(translations).strip()
30 print(f"{msg_type} translation for {language}: {translation}")
31
32# Register the event handler for partial translation
33ws.add_event_handler(
34 event_name=speechmatics.models.ServerMessageType.AddPartialTranslation,
35 event_handler=print_translation,
36)
37
38# Register the event handler for full translation
39ws.add_event_handler(
40 event_name=speechmatics.models.ServerMessageType.AddTranslation,
41 event_handler=print_translation,
42)
43
44settings = speechmatics.models.AudioSettings()
45
46# Define transcription parameters with translation
47# Full list of parameters described here: https://speechmatics.github.io/speechmatics-python/models
48
49translation_config = speechmatics.models.RTTranslationConfig(
50 target_languages=TRANSLATION_LANGUAGES,
51 #enable_partials=True # Optional argument to provide translation of partial sentences
52)
53
54transcription_config = speechmatics.models.TranscriptionConfig(
55 language=LANGUAGE,
56 translation_config=translation_config
57)
58
59print("Starting transcription (type Ctrl-C to stop):")
60with open(PATH_TO_FILE, 'rb') as fd:
61 try:
62 ws.run_synchronously(fd, transcription_config, settings)
63 except KeyboardInterrupt:
64 print("\nTranscription stopped.")
65 except HTTPStatusError as e:
66 if e.response.status_code == 401:
67 print('Invalid API key - Check your API_KEY at the top of the code!')
68 else:
69 raise e
70
Maximum number of translations: Each transcription can have up to five translations configured.
Translation Response
- Batch Translation
- Real-Time Translation
{
"format": "2.9",
"job": {
"created_at": "2023-01-23T19:31:19.354Z",
"data_name": "example.wav",
"duration": 15,
"id": "ggqjaazkqf"
},
"metadata": {
"created_at": "2023-01-23T19:31:44.766Z",
"type": "transcription",
"transcription_config": {
"language": "en",
"diarization": "speaker"
},
"translation_config": {
"target_languages": [
"es"
]
}
},
"results": [
{
"start_time": 0.78,
"end_time": 1.32,
"type": "word",
"alternatives": [
{
"content": "Welcome",
"confidence": 1.0,
"language": "en",
"speaker": "S1"
}
]
},
...
],
"translations": {
"es": [
{
"start_time": 0.78,
"end_time": 2.58,
"content": "Bienvenidos a Speechmatics.",
"speaker": "S1"
},
{
"start_time": 3.0,
"end_time": 7.94,
"content": "Esperamos que tengas un gran día.",
"speaker": "S1"
},
...
]
}
}
Real-time provides a stream of translation messages, per language requested. Translation messages will arrive after transcription messages, but won't delay transcription.
As per transcription there are two types of messages, Partial translations (optional) and Final translations.
Final Translation
A Final translation, is a final best prediction for the translation (usually the end of a sentence). Once output, these translations are considered Final and will not be updated afterwards.
Final translation example message
{
"format": "2.9",
"message": "AddTranslation",
"language": "es",
"results": [
{
"start_time": 5.45999987795949,
"end_time": 6.189999870583415,
"content": "Bienvenidos a Speechmatics.",
"speaker": "S1"
}
]
}
Partial Translation (optional)
A Partial translation, is a translation that can be updated at a later point in time as more context becomes available.
By default, only Final translations are produced. Partials must be explicitly enabled using the enable_partials
property in translation_config
for the session.
Partial translations are translations that can be updated, and often correspond to unfinished sentences. They have a lower latency than translation finals.
Partial translation example message
{
"format": "2.9",
"message": "AddPartialTranslation",
"language": "es",
"results": [
{
"start_time": 5.45999987795949,
"end_time": 5.889999870583415,
"content": "Bienvenidos a",
"speaker": "S1"
}
]
}
Each translated section of text matches directly to one or more words in the native language transcription based on the start and end time. Each one has the following properties:
language
: Real-time only. The translated language ISO Language Codecontent
: The translated contentstart_time
: The start time of the translated content, which matches the start time of the first word in the transcriptend_time
: The end time of the translated content, which matches the end time of the last word in the transcriptspeaker
: The speaker label whendiarization:speaker
is set - see here about Speaker Diarization- Speaker labels are only available on Final translations, not Partial translations
channel
: The channel label whendiarization:channel
is set - see here about Channel Diarization- Only available for Batch transcription
Supported Translation Pairs
Translation is supported for the majority of Speechmatics' languages. The supported translation pairs are listed below.
Transcription Language | Translation Target Language |
---|---|
English (en) | Bulgarian (bg), Catalan (ca), Mandarin (cmn), Czech (cs), Danish (da), German (de), Greek (el), Spanish (es), Estonian (et), Finnish (fi), French (fr), Galician (gl), Hindi (hi), Croatian (hr), Hungarian (hu), Indonesian (id), Italian (it), Japanese (ja), Korean (ko), Lithuanian (lt), Latvian (lv), Malay (ms), Dutch (nl), Norwegian (no), Polish (pl), Portuguese (pt), Romanian (ro), Russian (ru), Slovakian (sk), Slovenian (sl), Swedish (sv), Turkish (tr), Ukrainian (uk), Vietnamese (vi) |
Bulgarian (bg), Catalan (ca), Mandarin (cmn), Czech (cs), Danish (da), German (de), Greek (el), Spanish (es), Estonian (et), Finnish (fi), French (fr), Galician (gl), Hindi (hi), Croatian (hr), Hungarian (hu), Indonesian (id), Italian (it), Japanese (ja), Korean (ko), Lithuanian (lt), Latvian (lv), Malay (ms), Dutch (nl), Norwegian (no), Polish (pl), Portuguese (pt), Romanian (ro), Russian (ru), Slovakian (sk), Slovenian (sl), Swedish (sv), Turkish (tr), Ukrainian (uk), Vietnamese (vi) | English (en) |
Norwegian Bokmål (no)* | Norwegian Nynorsk (nn) |
*Norwgian Bokmål to Nynorsk is only supported on Batch SaaS.
Considerations
When using Translation, there are a few things to keep in mind:
- Accuracy of Transcription: We recommended using the Enhanced Operating Point for the best Translation results. Transcription accuracy directly impacts the accuracy of translation
- Punctuation: Punctuation plays a significant role in the accuracy of translation. It is therefore recommended to avoid disabling any Punctuation Marks or reducing the Punctuation Sensitivity to ensure the best possible results
- Formatting: The translation is applied to the written form transcript
- Transcription Time: Enabling Translation for a file will increase the turnaround time of jobs. The amount of time it increases for a single translation will be small. The number of translation target languages directly affects turnaround time. This does not affect Real-time, except when closing a connection when there can be a delay of up to 5 seconds while we receive the final translation
Limitations
- Maximum Number of Translations: Each transcription can have up to five translations
- Output Formats: At this time, only the JSON transcript format is supported for translation. Text and SRT transcript formats are only available in the native language for Batch transcription
- Other Transcription Features: The following transcription features are only available in the native language transcript, not the translation
- Single word timings
- Confidence scores
- Word tagging
- Output locale spelling
Batch Error Responses
Unsupported Target Language
If one or more of the target languages are not supported for the transcription language, the job will be rejected with a HTTP 400 error response.
This behaviour is different when the transcription language is not known ahead of time.
Example Bad Config:
{
"type": "transcription",
"transcription_config": {
"language": "en"
},
"translation_config": {
"target_languages": ["es", "zz"]
}
}
Response:
{
"code": 400,
"detail": "Job config JSON is invalid. Error: language zz is not a supported translation target for source language en",
"error": "Job rejected"
}
Unsupported Translation Pair
When using Language Identification, if one or more of the target languages are not supported for the identified transcription language, the job will be accepted and an error message will be included in the JSON output. No translations will be returned for that language pair.
This behaviour is different when the transcription language is known ahead of time.
The below job config illustrates the response if the language was identified and transcribed as German (de) with the target translation language set to Spanish (es).
{
"job": { ... },
"metadata": {
"created_at": "2023-05-26T15:01:48.412714Z",
"type": "transcription",
"transcription_config": {
"language": "auto"
},
"translation_config": {
"target_languages": [
"es"
]
},
"translation_errors": [
{"type": "unsupported_translation_pair", "message": "Translation from de to es currently not supported"}
],
...
},
"results": [...]
}
This enables the combination of automatic language identification and translation.
Too Many Target Languages
Each transcription can have up to five translations. If you request more than five a HTTP 400 error response is returned.
{
"code": 400,
"detail": "maximum number of target languages is 5 and requested count is 6",
"error": "Job rejected"
}
Translation Failure
In the event that Translation fails for one or more languages, the transcription process will complete but the Translation will not be returned. An error message will be included in the final JSON output.
{
"job": { ... },
"metadata": {
"created_at": "2023-05-26T15:01:48.412714Z",
"type": "transcription",
"transcription_config": {...},
"translation_config": {...},
"translation_errors": [
{"type": "translation_failed", "message": "Translation failed."}
],
...
},
"results": [...]
}
Real-Time Error Responses
Unsupported Target Language
If one or more of the target languages are not supported for the source language, a warning is raised but transcription will continue. No translations will be returned for that language.
WARNING:speechmatics.client:Translation from en to ta currently not supported
Too Many Target Languages
Each transcription can have up to five translations. If you request more than five a error is returned and the websocket connection will fail.
ERROR:speechmatics.exceptions.TranscriptionError: translation_config.target_languages should have at most 5 items.
Feedback
Do you have requests or feedback on translation? If so, please send us your thoughts via our Translation Feedback Form.
If you want to report an issue or get Support more urgently Raise an Issue instead.