Skip to main content

Translation

Transcription:BatchReal-TimeDeployments:All

Speechmatics enables you to translate your audio into multiple languages. Quickly add translation to your application through a single API call, with over 30 languages supported.

Test out our translation feature for free in the Speechmatics Portal, no coding required.

Translation can be enabled when transcribing either a file or in real-time, by using the Speechmatics SaaS, or by deploying it on-prem.

If you're new to Speechmatics, please see our guide on Transcribing a File or Transcribing in Real-Time. Once you are set up, include the following config to enable translation:

{
  "type": "transcription",
  "transcription_config": {
    "operating_point": "enhanced",
    "language": "en"
  },
  "translation_config": {
    "target_languages": ["es", "de"] # Set languages here to enable translation
  }
}

Quick Start

Python client example to translate a file for batch.
1from speechmatics.models import ConnectionSettings
2from speechmatics.batch_client import BatchClient
3from httpx import HTTPStatusError
4
5API_KEY = "YOUR_API_KEY"
6PATH_TO_FILE = "example.wav"
7LANGUAGE = "en" # Transcription language
8TRANSLATION_LANGUAGES = ["es","de"]
9
10settings = ConnectionSettings(
11    url="https://asr.api.speechmatics.com/v2",
12    auth_token=API_KEY,
13)
14
15# Define transcription parameters
16conf = {
17    "type": "transcription",
18    "transcription_config": {
19        "language": LANGUAGE
20    },
21    "translation_config": {
22        "target_languages":TRANSLATION_LANGUAGES
23    }
24}
25
26# Open the client using a context manager
27with BatchClient(settings) as client:
28    try:
29        job_id = client.submit_job(
30            audio=PATH_TO_FILE,
31            transcription_config=conf,
32        )
33        print(f'job {job_id} submitted successfully, waiting for transcript')
34
35        # Note that in production, you should set up notifications instead of polling.
36        # Notifications are described here: https://docs.speechmatics.com/features-other/notifications
37        transcript = client.wait_for_completion(job_id, transcription_format='json-v2')
38        for language in TRANSLATION_LANGUAGES:
39          # Print the translation for each language from the JSON
40          print(f"Translation for {language}")
41          translation = ""
42          for translated_segment in transcript["translations"][language]:
43              translation += translated_segment["content"] + " "
44          print(translation)
45    except HTTPStatusError as e:
46        if e.response.status_code == 401:
47            print('Invalid API key - Check your API_KEY at the top of the code!')
48        elif e.response.status_code == 400:
49            print(e.response.json()['detail'])
50        else:
51            raise e
52

Maximum number of translations: Each transcription can have up to five translations configured.

Translation Response

The returned JSON will include a new property called `translations`, which contains a list of translated text for each target language requested (using the same ISO Language Codes as for transcription).
{
    "format": "2.9",
    "job": {
        "created_at": "2023-01-23T19:31:19.354Z",
        "data_name": "example.wav",
        "duration": 15,
        "id": "ggqjaazkqf"
    },
    "metadata": {
        "created_at": "2023-01-23T19:31:44.766Z",
        "type": "transcription",
        "transcription_config": {
            "language": "en",
            "diarization": "speaker"
        },
        "translation_config": {
            "target_languages": [
                "es"
            ]
        }
    },
    "results": [
        {
            "start_time": 0.78,
            "end_time": 1.32,
            "type": "word",
            "alternatives": [
                {
                    "content": "Welcome",
                    "confidence": 1.0,
                    "language": "en",
                    "speaker": "S1"
                }
            ]
        },
        ...
    ],
    "translations": {
        "es": [
            {
                "start_time": 0.78,
                "end_time": 2.58,
                "content": "Bienvenidos a Speechmatics.",
                "speaker": "S1"
            },
            {
                "start_time": 3.0,
                "end_time": 7.94,
                "content": "Esperamos que tengas un gran día.",
                "speaker": "S1"
            },
            ...
        ]
      }
}

Each translated section of text matches directly to one or more words in the native language transcription based on the start and end time. Each one has the following properties:

  • language: Real-time only. The translated language ISO Language Code
  • content: The translated content
  • start_time: The start time of the translated content, which matches the start time of the first word in the transcript
  • end_time: The end time of the translated content, which matches the end time of the last word in the transcript
  • speaker: The speaker label when diarization:speaker is set - see here about Speaker Diarization
    • Speaker labels are only available on Final translations, not Partial translations
  • channel: The channel label when diarization:channel is set - see here about Channel Diarization
    • Only available for Batch transcription

Supported Translation Pairs

Translation is supported for the majority of Speechmatics' languages. The supported translation pairs are listed below.

Transcription LanguageTranslation Target Language
English (en)Bulgarian (bg), Catalan (ca), Mandarin (cmn), Czech (cs), Danish (da), German (de), Greek (el), Spanish (es), Estonian (et), Finnish (fi), French (fr), Galician (gl), Hindi (hi), Croatian (hr), Hungarian (hu), Indonesian (id), Italian (it), Japanese (ja), Korean (ko), Lithuanian (lt), Latvian (lv), Malay (ms), Dutch (nl), Norwegian (no), Polish (pl), Portuguese (pt), Romanian (ro), Russian (ru), Slovakian (sk), Slovenian (sl), Swedish (sv), Turkish (tr), Ukrainian (uk), Vietnamese (vi)
Bulgarian (bg), Catalan (ca), Mandarin (cmn), Czech (cs), Danish (da), German (de), Greek (el), Spanish (es), Estonian (et), Finnish (fi), French (fr), Galician (gl), Hindi (hi), Croatian (hr), Hungarian (hu), Indonesian (id), Italian (it), Japanese (ja), Korean (ko), Lithuanian (lt), Latvian (lv), Malay (ms), Dutch (nl), Norwegian (no), Polish (pl), Portuguese (pt), Romanian (ro), Russian (ru), Slovakian (sk), Slovenian (sl), Swedish (sv), Turkish (tr), Ukrainian (uk), Vietnamese (vi)English (en)
Norwegian Bokmål (no)*Norwegian Nynorsk (nn)

*Norwgian Bokmål to Nynorsk is only supported on Batch SaaS.

Currently unsupported Speechmatics languages: Arabic, Bashkir, Basque, Belarusian, Cantonese, Esperanto, Interlingua, Marathi, Mongolian, Persian, Tamil, Thai, Uyghur, Welsh.

Considerations

When using translation, there are a few things to keep in mind:

  • Accuracy of Transcription: We recommended using the Enhanced operating point for the best translation results. Transcription accuracy directly impacts the accuracy of translation
  • Punctuation: Punctuation plays a significant role in the accuracy of translation. It is therefore recommended to avoid disabling any Punctuation Marks or reducing the Punctuation Sensitivity to ensure the best possible results
  • Formatting: The translation is applied to the written form transcript
  • Transcription Time: Enabling translation for a file will increase the turnaround time of jobs. The amount of time it increases for a single translation will be small. The number of translation target languages directly affects turnaround time. This does not affect Real-time, except when closing a connection when there can be a delay of up to 5 seconds while we receive the final translation

Limitations

  • Maximum Number of Translations: Each transcription can have up to five translations
  • Output Formats: At this time, only the JSON transcript format is supported for translation. Text and SRT transcript formats are only available in the native language for Batch transcription
  • Other Transcription Features: The following transcription features are only available in the native language transcript, not the translation
    • Single word timings
    • Confidence scores
    • Word tagging
    • Output locale spelling

Batch Error Responses

Unsupported Target Language

If one or more of the target languages are not supported for the transcription language, the job will be rejected with a HTTP 400 error response.

This behaviour is different when the transcription language is not known ahead of time.

Example Bad Config:

{
  "type": "transcription",
  "transcription_config": {
    "language": "en"
  },
  "translation_config": {
    "target_languages": ["es", "zz"]
  }
}

Response:

{
  "code": 400,
  "detail": "Job config JSON is invalid. Error: language zz is not a supported translation target for source language en",
  "error": "Job rejected"
}

Unsupported Translation Pair

When using Language Identification, if one or more of the target languages are not supported for the identified transcription language, the job will be accepted and an error message will be included in the JSON output. No translations will be returned for that language pair.

This behaviour is different when the transcription language is known ahead of time.

The below job config illustrates the response if the language was identified and transcribed as German (de) with the target translation language set to Spanish (es).

{
  "job": { ... },
  "metadata": {
    "created_at": "2023-05-26T15:01:48.412714Z",
    "type": "transcription",
    "transcription_config": {
      "language": "auto"
    },
    "translation_config": {
      "target_languages": [
        "es"
      ]
    },
    "translation_errors": [
      {"type": "unsupported_translation_pair", "message": "Translation from de to es currently not supported"}
    ],
    ...
  },
  "results": [...]
}

This enables the combination of automatic language identification and translation.

Too Many Target Languages

Each transcription can have up to five translations. If you request more than five a HTTP 400 error response is returned.

{
  "code": 400,
  "detail": "maximum number of target languages is 5 and requested count is 6",
  "error": "Job rejected"
}

Translation Failure

In the event that Translation fails for one or more languages, the transcription process will complete but the Translation will not be returned. An error message will be included in the final JSON output.

{
  "job": { ... },
  "metadata": {
    "created_at": "2023-05-26T15:01:48.412714Z",
    "type": "transcription",
    "transcription_config": {...},
    "translation_config": {...},
    "translation_errors": [
      {"type": "translation_failed", "message": "Translation failed."}
    ],
    ...
  },
  "results": [...]
}

Real-Time Error Responses

Unsupported Target Language

If one or more of the target languages are not supported for the source language, a warning is raised but transcription will continue. No translations will be returned for that language.

WARNING:speechmatics.client:Translation from en to ta currently not supported

Too Many Target Languages

Each transcription can have up to five translations. If you request more than five a error is returned and the websocket connection will fail.

ERROR:speechmatics.exceptions.TranscriptionError: translation_config.target_languages should have at most 5 items.

Feedback

Do you have requests or feedback on translation? If so, please send us your thoughts via our Translation Feedback Form.

If you want to report an issue or get Support more urgently Raise an Issue instead.