Skip to main content

Alignment

Transcription:BatchDeployments:SaaS

Alignment allows the user to submit an audio file and a text file, and get back the speech timing information. This allows users to determine when exactly a given word was spoken in the context of the supplied audio file.

If you do not have access to use the alignment feature, and you would like to, please reach out to Support or speak to your Account Manager.

Supported Formats

The input text file must be UTF-8 encoded plain text file. Characters outside this format will mean the job is rejected.

Text Formatting

Input

During the alignment process, Speechmatics tries to extract words from the text. Any string of characters separated by whitespace (space, tab, newline, etc.) is considered as a word. Any markup in the text file, with SGML-like tags with angled-brackets is considered as comments. For example, text within the comment delimiters (<!--, -->) or angle brackets (<, >) is ignored. Therefore, given this text:

Hello <markup> world <!-- comment > comment --> how are you?

The following words will be aligned with the provided audio file:

Hello world how are you?

Output

The timing information (termed as alignment files) are available in two formats:

  • Word Start and End (word_start_and_end): This is the default format:
<time=0.12>Hello<time=0.23> <markup> <time=0.34>world<time=0.45> <!-- comment > comment -->
<time=0.56>how<time=0.67> <time=0.78>are<time=0.89> <time=0.90>you?<time=1.00>
  • One per Line (one_per_line). This must be specified when you request the transcript via HTTP request.
[00:00:00.1] Hello <markup> world <!-- comment > comment --> how are you?

Submitting Alignment Jobs

Creating an alignment job is similar in process to transcription job. An HTTP POST request must be made to /v2/jobs endpoint with following form fields:

  • config: The job config for alignment
  • data_file: The media file containing the speech. Can be passed in via config if the file is stored in an online location
  • text_file: The text file containing the transcript. Can be passed in via config if the file is stored in an online location

If you do not provide all of the above the job will be rejected.

The job config must state that the job type is alignment and the language of the audio and text.

{
  "type": "alignment",
  "alignment_config": {
    "language": "en"
  }
}

Retrieving Alignment Jobs

Checking status of alignment jobs is done in the same way as transcription jobs. This is described on this page.

An aligned file can be retrieved from the /v2/jobs/<JOB_ID>/alignment endpoint. By default, the word_start_and_end alignment format is returned. This can be overridden using the tags query string parameter:

curl -X GET "https://asr.api.speechmatics.com/v2/jobs/${JOB_ID}/alignment?tags=one_per_line" \
    -H "Authorization: Bearer ${API_KEY}"

Use the following endpoints to retrieve the inputs files used for an alignment job:

  • /v2/jobs/<JOB_ID>/text: to get the text file submitted
  • /v2/jobs/<JOB_ID>/data: to get the audio file submitted

Note that alignment follows Speechmatics' Data Retention Limits.

Fetching Files from an Online Location

Speechmatics supports retrieving files from an online location. If you store your digital media and transcripts in cloud storage (for example AWS S3 or Azure Blob Storage) you can also submit a job by providing the URL of the audio file or transcript.

To retrieve files from an online location, you must specify the location for the media and/or transcript in the configuration of your request. You can locally upload a media file and retrieve a text file from an online location (or vice versa):

{
  "type": "alignment",
  "fetch_data": { "url": "$MY_AUDIO_URL" },
  "fetch_text": { "url": "$MY_TRANSCRIPT" },
  "alignment_config": { "language": "en" }
}

You should not use fetch_data or fetch_text with locally uploaded files simultaneously, as this will cause the job to fail.

Callback Notifications

Alignment jobs can also be used with callback notifications by including the notification_config section in the job config when submitting the job. Please ensure you have allowlisted Speechmatics' egress IPs to allow notifications.

{
  "type": "alignment",
  "alignment_config": {
    "language": "en"
  },
  "notification_config": [
    {
      "contents": ["alignment"],
      "url": "https://lorem.ipsum/"
    },
    {
      "contents": ["alignment.one_per_line", "text"],
      "method": "post",
      "url": "https://dolor.sit.amet/"
    }
  ]
}

The following outputs are supported:

  • alignment, alignment.one_per_line, alignment.word_start_and_end: the Aligned transcript
  • text: the non-aligned transcript submitted as part of the job request
  • data: the media file submitted as part of the job request
  • jobinfo: the summary information about the job, to support identification and tracking