Translation

Transcription:BatchReal-TimeDeployments:All

Speechmatics enables you to translate your audio into multiple languages. Quickly add translation to your application through a single API call, with over 30 languages supported.

Test out our Translation Feature for free in the Speechmatics On-Demand Portal, no coding required.

Translation can be enabled when transcribing either a file or in real-time, by using the Speechmatics SaaS, or by deploying it On-Prem.

If you're new to Speechmatics, please see our guide on Transcribing a File or Transcribing in Real-Time. Once you are set up, include the following config to enable Translation:

{
  "type": "transcription",
  "transcription_config": {
    "operating_point": "enhanced",
    "language": "en"
  },
  "translation_config": {
    "target_languages": ["es", "de"] # Set languages here to enable translation
  }
}

Quick Start

Batch Translation
Real-Time Translation

Python client example to translate a file for batch.

1from speechmatics.models import ConnectionSettings
2from speechmatics.batch_client import BatchClient
3from httpx import HTTPStatusError
4
5API_KEY = "YOUR_API_KEY"
6PATH_TO_FILE = "example.wav"
7LANGUAGE = "en" # Transcription language
8TRANSLATION_LANGUAGES = ["es","de"]
9
10settings = ConnectionSettings(
11    url="https://asr.api.speechmatics.com/v2",
12    auth_token=API_KEY,
13)
14
15# Define transcription parameters
16conf = {
17    "type": "transcription",
18    "transcription_config": {
19        "language": LANGUAGE
20    },
21    "translation_config": {
22        "target_languages":TRANSLATION_LANGUAGES
23    }
24}
25
26# Open the client using a context manager
27with BatchClient(settings) as client:
28    try:
29        job_id = client.submit_job(
30            audio=PATH_TO_FILE,
31            transcription_config=conf,
32        )
33        print(f'job {job_id} submitted successfully, waiting for transcript')
34
35        # Note that in production, you should set up notifications instead of polling.
36        # Notifications are described here: https://docs.speechmatics.com/features-other/notifications
37        transcript = client.wait_for_completion(job_id, transcription_format='json-v2')
38        for language in TRANSLATION_LANGUAGES:
39          # Print the translation for each language from the JSON
40          print(f"Translation for {language}")
41          translation = ""
42          for translated_segment in transcript["translations"][language]:
43              translation += translated_segment["content"] + " "
44          print(translation)
45    except HTTPStatusError as e:
46        if e.response.status_code == 401:
47            print('Invalid API key - Check your API_KEY at the top of the code!')
48        elif e.response.status_code == 400:
49            print(e.response.json()['detail'])
50        else:
51            raise e
52

Python client example to translate a file in real-time, see here for more examples of Real-Time Transcription

1import speechmatics
2from httpx import HTTPStatusError
3
4API_KEY = "YOUR_API_KEY"
5PATH_TO_FILE = "example.wav"
6LANGUAGE = "en" # Transcription language
7TRANSLATION_LANGUAGES = ["es","de"]
8CONNECTION_URL = f"wss://eu2.rt.speechmatics.com/v2/{LANGUAGE}"
9
10# Create a transcription client
11ws = speechmatics.client.WebsocketClient(
12    speechmatics.models.ConnectionSettings(
13        url=CONNECTION_URL,
14        auth_token=API_KEY,
15    )
16)
17
18# Define an event handler to print the translations
19def print_translation(msg):
20    msg_type="Final"
21    if msg['message'] == "AddPartialTranslation":
22        msg_type="Partial"
23
24    language = msg['language'] # language for translation message
25    translations = []
26    for translation_segment in msg['results']:
27        translations.append(translation_segment['content'])
28
29    translation = " ".join(translations).strip()
30    print(f"{msg_type} translation for {language}: {translation}")
31
32# Register the event handler for partial translation
33ws.add_event_handler(
34    event_name=speechmatics.models.ServerMessageType.AddPartialTranslation,
35    event_handler=print_translation,
36)
37
38# Register the event handler for full translation
39ws.add_event_handler(
40    event_name=speechmatics.models.ServerMessageType.AddTranslation,
41    event_handler=print_translation,
42)
43
44settings = speechmatics.models.AudioSettings()
45
46# Define transcription parameters with translation
47# Full list of parameters described here: https://speechmatics.github.io/speechmatics-python/models
48
49translation_config = speechmatics.models.RTTranslationConfig(
50    target_languages=TRANSLATION_LANGUAGES,
51    #enable_partials=True # Optional argument to provide translation of partial sentences
52)
53
54transcription_config = speechmatics.models.TranscriptionConfig(
55    language=LANGUAGE,
56    translation_config=translation_config
57)
58
59print("Starting transcription (type Ctrl-C to stop):")
60with open(PATH_TO_FILE, 'rb') as fd:
61    try:
62        ws.run_synchronously(fd, transcription_config, settings)
63    except KeyboardInterrupt:
64        print("\nTranscription stopped.")
65    except HTTPStatusError as e:
66        if e.response.status_code == 401:
67            print('Invalid API key - Check your API_KEY at the top of the code!')
68        else:
69            raise e
70

Maximum number of translations: Each transcription can have up to five translations configured.

Translation Response

Batch Translation
Real-Time Translation

The returned JSON will include a new property called `translations`, which contains a list of translated text for each target language requested (using the same ISO Language Codes as for transcription).

{
    "format": "2.9",
    "job": {
        "created_at": "2023-01-23T19:31:19.354Z",
        "data_name": "example.wav",
        "duration": 15,
        "id": "ggqjaazkqf"
    },
    "metadata": {
        "created_at": "2023-01-23T19:31:44.766Z",
        "type": "transcription",
        "transcription_config": {
            "language": "en",
            "diarization": "speaker"
        },
        "translation_config": {
            "target_languages": [
                "es"
            ]
        }
    },
    "results": [
        {
            "start_time": 0.78,
            "end_time": 1.32,
            "type": "word",
            "alternatives": [
                {
                    "content": "Welcome",
                    "confidence": 1.0,
                    "language": "en",
                    "speaker": "S1"
                }
            ]
        },
        ...
    ],
    "translations": {
        "es": [
            {
                "start_time": 0.78,
                "end_time": 2.58,
                "content": "Bienvenidos a Speechmatics.",
                "speaker": "S1"
            },
            {
                "start_time": 3.0,
                "end_time": 7.94,
                "content": "Esperamos que tengas un gran día.",
                "speaker": "S1"
            },
            ...
        ]
      }
}

Real-time provides a stream of translation messages, per language requested. Translation messages will arrive after transcription messages, but won't delay transcription.

As per transcription there are two types of messages, Partial translations (optional) and Final translations.

Final Translation

A Final translation, is a final best prediction for the translation (usually the end of a sentence). Once output, these translations are considered Final and will not be updated afterwards.

Final translation example message

{
  "format": "2.9",
  "message": "AddTranslation",
  "language": "es",
  "results": [
    {
      "start_time": 5.45999987795949,
      "end_time": 6.189999870583415,
      "content": "Bienvenidos a Speechmatics.",
      "speaker": "S1"
    }
  ]
}

Partial Translation (optional)

A Partial translation, is a translation that can be updated at a later point in time as more context becomes available.

By default, only Final translations are produced. Partials must be explicitly enabled using the enable_partials property in translation_config for the session.

Partial translations are translations that can be updated, and often correspond to unfinished sentences. They have a lower latency than translation finals.

Partial translation example message

{
  "format": "2.9",
  "message": "AddPartialTranslation",
  "language": "es",
  "results": [
    {
      "start_time": 5.45999987795949,
      "end_time": 5.889999870583415,
      "content": "Bienvenidos a",
      "speaker": "S1"
    }
  ]
}

Each translated section of text matches directly to one or more words in the native language transcription based on the start and end time. Each one has the following properties:

language: Real-time only. The translated language ISO Language Code
content: The translated content
start_time: The start time of the translated content, which matches the start time of the first word in the transcript
end_time: The end time of the translated content, which matches the end time of the last word in the transcript
speaker: The speaker label when diarization:speaker is set - see here about Speaker Diarization
- Speaker labels are only available on Final translations, not Partial translations
channel: The channel label when diarization:channel is set - see here about Channel Diarization
- Only available for Batch transcription

Supported Translation Pairs

Translation is supported for the majority of Speechmatics' languages. The supported translation pairs are listed below.

Transcription Language	Translation Target Language
English (en)	Bulgarian (bg), Catalan (ca), Mandarin (cmn), Czech (cs), Danish (da), German (de), Greek (el), Spanish (es), Estonian (et), Finnish (fi), French (fr), Galician (gl), Hindi (hi), Croatian (hr), Hungarian (hu), Indonesian (id), Italian (it), Japanese (ja), Korean (ko), Lithuanian (lt), Latvian (lv), Malay (ms), Dutch (nl), Norwegian (no), Polish (pl), Portuguese (pt), Romanian (ro), Russian (ru), Slovakian (sk), Slovenian (sl), Swedish (sv), Turkish (tr), Ukrainian (uk), Vietnamese (vi)
Bulgarian (bg), Catalan (ca), Mandarin (cmn), Czech (cs), Danish (da), German (de), Greek (el), Spanish (es), Estonian (et), Finnish (fi), French (fr), Galician (gl), Hindi (hi), Croatian (hr), Hungarian (hu), Indonesian (id), Italian (it), Japanese (ja), Korean (ko), Lithuanian (lt), Latvian (lv), Malay (ms), Dutch (nl), Norwegian (no), Polish (pl), Portuguese (pt), Romanian (ro), Russian (ru), Slovakian (sk), Slovenian (sl), Swedish (sv), Turkish (tr), Ukrainian (uk), Vietnamese (vi)	English (en)
Norwegian Bokmål (no)*	Norwegian Nynorsk (nn)

*Norwgian Bokmål to Nynorsk is only supported on Batch SaaS.

Currently unsupported Speechmatics languages: Arabic, Bashkir, Basque, Belarusian, Cantonese, Esperanto, Interlingua, Marathi, Mongolian, Persian, Tamil, Thai, Uyghur, Welsh.

Considerations

When using Translation, there are a few things to keep in mind:

Accuracy of Transcription: We recommended using the Enhanced Operating Point for the best Translation results. Transcription accuracy directly impacts the accuracy of translation
Punctuation: Punctuation plays a significant role in the accuracy of translation. It is therefore recommended to avoid disabling any Punctuation Marks or reducing the Punctuation Sensitivity to ensure the best possible results
Formatting: The translation is applied to the written form transcript
Transcription Time: Enabling Translation for a file will increase the turnaround time of jobs. The amount of time it increases for a single translation will be small. The number of translation target languages directly affects turnaround time. This does not affect Real-time, except when closing a connection when there can be a delay of up to 5 seconds while we receive the final translation

Limitations

Maximum Number of Translations: Each transcription can have up to five translations
Output Formats: At this time, only the JSON transcript format is supported for translation. Text and SRT transcript formats are only available in the native language for Batch transcription
Other Transcription Features: The following transcription features are only available in the native language transcript, not the translation
- Single word timings
- Confidence scores
- Word tagging
- Output locale spelling

Batch Error Responses

Unsupported Target Language

If one or more of the target languages are not supported for the transcription language, the job will be rejected with a HTTP 400 error response.

This behaviour is different when the transcription language is not known ahead of time.

Example Bad Config:

{
  "type": "transcription",
  "transcription_config": {
    "language": "en"
  },
  "translation_config": {
    "target_languages": ["es", "zz"]
  }
}

Response:

{
  "code": 400,
  "detail": "Job config JSON is invalid. Error: language zz is not a supported translation target for source language en",
  "error": "Job rejected"
}

Unsupported Translation Pair

When using Language Identification, if one or more of the target languages are not supported for the identified transcription language, the job will be accepted and an error message will be included in the JSON output. No translations will be returned for that language pair.

This behaviour is different when the transcription language is known ahead of time.

The below job config illustrates the response if the language was identified and transcribed as German (de) with the target translation language set to Spanish (es).

{
  "job": { ... },
  "metadata": {
    "created_at": "2023-05-26T15:01:48.412714Z",
    "type": "transcription",
    "transcription_config": {
      "language": "auto"
    },
    "translation_config": {
      "target_languages": [
        "es"
      ]
    },
    "translation_errors": [
      {"type": "unsupported_translation_pair", "message": "Translation from de to es currently not supported"}
    ],
    ...
  },
  "results": [...]
}

This enables the combination of automatic language identification and translation.

Too Many Target Languages

Each transcription can have up to five translations. If you request more than five a HTTP 400 error response is returned.

{
  "code": 400,
  "detail": "maximum number of target languages is 5 and requested count is 6",
  "error": "Job rejected"
}

Translation Failure

In the event that Translation fails for one or more languages, the transcription process will complete but the Translation will not be returned. An error message will be included in the final JSON output.

{
  "job": { ... },
  "metadata": {
    "created_at": "2023-05-26T15:01:48.412714Z",
    "type": "transcription",
    "transcription_config": {...},
    "translation_config": {...},
    "translation_errors": [
      {"type": "translation_failed", "message": "Translation failed."}
    ],
    ...
  },
  "results": [...]
}

Real-Time Error Responses

Unsupported Target Language

If one or more of the target languages are not supported for the source language, a warning is raised but transcription will continue. No translations will be returned for that language.

WARNING:speechmatics.client:Translation from en to ta currently not supported

Too Many Target Languages

Each transcription can have up to five translations. If you request more than five a error is returned and the websocket connection will fail.

ERROR:speechmatics.exceptions.TranscriptionError: translation_config.target_languages should have at most 5 items.

Feedback

Do you have requests or feedback on translation? If so, please send us your thoughts via our Translation Feedback Form.

If you want to report an issue or get Support more urgently Raise an Issue instead.

Translation

Quick Start​

Translation Response​

Final Translation

Partial Translation (optional)

Supported Translation Pairs​

Considerations​

Limitations​

Batch Error Responses​

Unsupported Target Language​

Unsupported Translation Pair​

Too Many Target Languages​

Translation Failure​

Real-Time Error Responses​

Unsupported Target Language​

Too Many Target Languages​

Feedback​

Quick Start

Translation Response

Supported Translation Pairs

Considerations

Limitations

Batch Error Responses

Unsupported Target Language

Unsupported Translation Pair

Too Many Target Languages

Translation Failure

Real-Time Error Responses

Unsupported Target Language

Too Many Target Languages

Feedback