Speech to TextBatch Transcription

Quickstart

Learn how to transcribe pre-recorded audio and video files.

The quickest way to try transcribing for free is by creating a Speechmatics account and using our web portal.

Set up your transcription job and click the button on the output page to get the code and use it in the API.

Using the Batch API

1. Create an API key

Create an API key in the portal here, which you'll use to securely access the API. Store the key as a managed secret.

Enterprise customers may need to speak to Support to get your API keys.

2. Pick and install a library

Check out our JavaScript client or Python client to get started.

Install using NPM:

npm install @speechmatics/batch-client

Install using pip:

pip3 install speechmatics-python

3. Grab a sample file

Download and save our example.wav

4. Insert API key

Paste your API key into YOUR_API_KEY in the code below.

// This example transcribes a file in NodeJS.
// For examples in other environments, see the link above

import { openAsBlob } from "node:fs"; //
import { BatchClient } from "@speechmatics/batch-client"; //

const client = new BatchClient({
  apiKey: YOUR_API_KEY,
  appId: "nodeJS-example",
}); //

console.log("Sending file for transcription...");

async function transcribeFile() {
  const blob = await openAsBlob("./example.wav");
  const file = new File([blob], "example.wav");

  const response = await client.transcribe(
    file,
    {
      transcription_config: {
        language: "en",
      },
    },
    "json-v2",
  );

  console.log("Transcription finished!");

  console.log(
    // Transcripts can be strings when the 'txt' format is chosen
    typeof response === "string"
      ? response
      : response.results.map((r) => r.alternatives?.[0].content).join(" "),
  );
}

transcribeFile();

You should then see the following output:

Sending file for transcription...
Transcription finished!
Welcome to Speechmatics . We're delighted that you've decided to try our speech to text software to get going . Just create an API key and submit a transcription request to our API . We hope you'll be very impressed by the results . Thank you .

from speechmatics.models import ConnectionSettings
from speechmatics.batch_client import BatchClient
from httpx import HTTPStatusError

API_KEY = "YOUR_API_KEY"
PATH_TO_FILE = "example.wav"
LANGUAGE = "en"

settings = ConnectionSettings(
    url="https://asr.api.speechmatics.com/v2",
    auth_token=API_KEY,
)

# Define transcription parameters
conf = {"type": "transcription", "transcription_config": {"language": LANGUAGE}}

# Open the client using a context manager
with BatchClient(settings) as client:
    try:
        job_id = client.submit_job(
            audio=PATH_TO_FILE,
            transcription_config=conf,
        )
        print(f"job {job_id} submitted successfully, waiting for transcript")

        # Note that in production, you should set up notifications instead of polling.
        # Notifications are described here: https://docs.speechmatics.com/speech-to-text/batch/notifications
        transcript = client.wait_for_completion(job_id, transcription_format="txt")
        # To see the full output, try setting transcription_format='json-v2'.
        print(transcript)
    except HTTPStatusError as e:
        if e.response.status_code == 401:
            print("Invalid API key - Check your API_KEY at the top of the code!")
        elif e.response.status_code == 400:
            print(e.response.json()["detail"])
        else:
            raise e

Transcript response schema

The transcript includes information about the job and metadata such as the transcription configuration that was used.

Please refer to our API Reference for full details about the transcript contents.

formatstringrequired

Speechmatics JSON transcript format version number.

Example: 2.1

job required

Summary information about an ASR job, to support identification and tracking.

created_atdate-timerequired

The UTC date time the job was created.

Example: 2018-01-09T12:29:01.853047Z

data_namestringrequired

Name of data file submitted for job.

durationintegerrequired

The data file audio duration (in seconds).

Possible values: >= 0

idstringrequired

The unique id assigned to the job.

Example: a1b2c3d4e5

text_namestring

Name of the text file submitted to be aligned to audio.

tracking

titlestring

The title of the job.

referencestring

External system reference.

tagsstring[]

detailsobject

Customer-defined JSON structure.

metadata required

Summary information about the output from an ASR job, comprising the job type and configuration parameters used when generating the output.

created_atdate-timerequired

The UTC date time the transcription output was created.

Example: 2018-01-09T12:29:01.853047Z

typestringrequired

Possible values: [alignment, transcription]

transcription_config

languagestringrequired

Language model to process the audio input, normally specified as an ISO language code

domainstring

Request a specialized model based on 'language' but optimized for a particular field, e.g. "finance" or "medical".

output_localestring

Language locale to be used when generating the transcription output, normally specified as an ISO language code

operating_pointstring

Specify an operating point to use. Operating points change the transcription process in a high level way, such as altering the acoustic model. The default is standard.

standard:
enhanced: transcription will take longer but be more accurate than 'standard'

Possible values: [standard, enhanced]

additional_vocab object[]

List of custom words or phrases that should be recognized. Alternative pronunciations can be specified to aid recognition.

Array [

contentstringrequired

sounds_likestring[]

]

punctuation_overrides

Control punctuation settings.

sensitivityfloat

Ranges between zero and one. Higher values will produce more punctuation. The default is 0.5.

Possible values: >= 0 and <= 1

permitted_marksstring[]

The punctuation marks which the client is prepared to accept in transcription output, or the special value 'all' (the default). Unsupported marks are ignored. This value is used to guide the transcription process.

Possible values: Value must match regular expression ^(.|all)$

diarizationstring

Specify whether speaker or channel labels are added to the transcript. The default is none.

none: no speaker or channel labels are added.
speaker: speaker attribution is performed based on acoustic matching; all input channels are mixed into a single stream for processing.
channel: multiple input channels are processed individually and collated into a single transcript.

Possible values: [none, speaker, channel]

channel_diarization_labelsstring[]

Transcript labels to use when using collating separate input channels.

Possible values: Value must match regular expression ^[A-Za-z0-9._]+$

enable_entitiesboolean

Include additional 'entity' objects in the transcription results (e.g. dates, numbers) and their original spoken form. These entities are interleaved with other types of results. The concatenation of these words is represented as a single entity with the concatenated written form present in the 'content' field. The entities contain a 'spoken_form' field, which can be used in place of the corresponding 'word' type results, in case a spoken form is preferred to a written form. They also contain a 'written_form', which can be used instead of the entity, if you want a breakdown of the words without spaces. They can still contain non-breaking spaces and other special whitespace characters, as they are considered part of the word for the formatting output. In case of a written_form, the individual word times are estimated and might not be accurate if the order of the words in the written form does not correspond to the order they were actually spoken (such as 'one hundred million dollars' and '$100 million').

max_delay_modestring

Whether or not to enable flexible endpointing and allow the entity to continue to be spoken.

Possible values: [fixed, flexible]

transcript_filtering_config

Configuration for applying filtering to the transcription

remove_disfluenciesboolean

If true, words that are identified as disfluencies will be removed from the transcript. If false (default), they are tagged in the transcript as 'disfluency'.

speaker_diarization_config

Configuration for speaker diarization

speaker_sensitivityfloat

Controls how sensitive the algorithm is in terms of keeping similar speakers separate, as opposed to combining them into a single speaker. Higher values will typically lead to more speakers, as the degree of difference between speakers in order to allow them to remain distinct will be lower. A lower value for this parameter will conversely guide the algorithm towards being less sensitive in terms of retaining similar speakers, and as such may lead to fewer speakers overall. The default is 0.5.

Possible values: >= 0 and <= 1

orchestrator_versionstring

The engine version used to generate transcription output.

Example: 2024.12.26085+a0a32e61ad.HEAD

translation_errors undefined[]

List of errors that occurred in the translation stage.

Array [

typestring

Possible values: [translation_failed, unsupported_translation_pair]

messagestring

Human readable error message

]

summarization_errors undefined[]

List of errors that occurred in the summarization stage.

Array [

typestring

Possible values: [summarization_failed, unsupported_language]

messagestring

Human readable error message

]

sentiment_analysis_errors undefined[]

List of errors that occurred in the sentiment analysis stage.

Array [

typestring

Possible values: [sentiment_analysis_failed, unsupported_language]

messagestring

Human readable error message

]

topic_detection_errors undefined[]

List of errors that occurred in the topic detection stage.

Array [

typestring

Possible values: [topic_detection_failed, unsupported_list_of_topics, unsupported_language]

messagestring

Human readable error message

]

auto_chapters_errors undefined[]

List of errors that occurred in the auto chapters stage.

Array [

typestring

Possible values: [auto_chapters_failed, unsupported_language]

messagestring

Human readable error message

]

alignment_config

languagestringrequired

output_config object

srt_overrides object

Parameters that override default values of srt conversion. max_line_length: sets maximum count of characters per subtitle line including white space. max_lines: sets maximum count of lines in a subtitle section.

max_line_lengthinteger

max_linesinteger

language_pack_info

Properties of the language pack.

language_descriptionstring

Full descriptive name of the language, e.g. 'Japanese'.

word_delimiterstringrequired

The character to use to separate words.

writing_directionstring

The direction that words in the language should be written and read in.

Possible values: [left-to-right, right-to-left]

itnboolean

Whether or not ITN (inverse text normalization) is available for the language pack.

adaptedboolean

Whether or not language model adaptation has been applied to the language pack.

language_identification object

Result of the language identification of the audio, configured using language_identification_config, or setting the transcription language to auto.

results object[]

Array [

alternatives object[]

Array [

languagestring

confidencenumber

]

start_timenumber

end_timenumber

]

errorstring

Possible values: [LOW_CONFIDENCE, UNEXPECTED_LANGUAGE, NO_SPEECH, FILE_UNREADABLE, OTHER]

messagestring

results RecognitionResult[]required

Array [

channelstring

start_timefloatrequired

end_timefloatrequired

volumefloat

An indication of the volume of audio across the time period the word was spoken.

Possible values: >= 0 and <= 100

is_eosboolean

Whether the punctuation mark is an end of sentence character. Only applies to punctuation marks.

typestringrequired

New types of items may appear without being requested; unrecognized item types can be ignored.

Possible values: [word, punctuation, entity]

written_form object[]

Array [

alternatives undefined[]required

Array [

contentstringrequired

confidencefloatrequired

languagestringrequired

display

directionstringrequired

Possible values: [ltr, rtl]

speakerstring

tagsstring[]

]

end_timefloatrequired

start_timefloatrequired

typestringrequired

What kind of object this is. See #/Definitions/RecognitionResult for definitions of the enums.

Possible values: [word]

]

spoken_form object[]

Array [

alternatives undefined[]required

Array [

contentstringrequired

confidencefloatrequired

languagestringrequired

display

directionstringrequired

Possible values: [ltr, rtl]

speakerstring

tagsstring[]

]

end_timefloatrequired

start_timefloatrequired

typestringrequired

What kind of object this is. See #/Definitions/RecognitionResult for definitions of the enums.

Possible values: [word, punctuation]

]

alternatives undefined[]

Array [

contentstringrequired

confidencefloatrequired

languagestringrequired

display

directionstringrequired

Possible values: [ltr, rtl]

speakerstring

tagsstring[]

]

attaches_tostring

Attachment direction of the punctuation mark. This only applies to punctuation marks. This information can be used to produce a well-formed text representation by placing the word_delimiter from language_pack_info on the correct side of the punctuation mark.

Possible values: [previous, next, both, none]

]

translations object

Translations of the transcript into other languages. It is a map of ISO language codes to arrays of translated sentences. Configured using translation_config.

[property name: string] object[]

Array [

start_timefloat

end_timefloat

contentstring

speakerstring

channelstring

]

summary object

Summary of the transcript, configured using summarization_config.

contentstring

sentiment_analysis object

The main object that holds sentiment analysis data.

sentiment_analysis object

Holds the detailed sentiment analysis information.

segments object[]

An array of objects that represent a segment of text and its associated sentiment.

Array [

textstring

Represents the transcript of the analysed segment

sentimentstring

The assigned sentiment to the segment, which can be positive, neutral or negative

start_timefloat

The timestamp corresponding to the beginning of the transcription segment

end_timefloat

The timestamp corresponding to the end of the transcription segment

speakerstring

The speaker label for the segment, if speaker diarization is enabled

channelstring

The channel label for the segment, if channel diarization is enabled

confidencefloat

A confidence score in the range of 0-1

]

summary object

An object that holds overall sentiment information, and per-speaker and per-channel sentiment data.

overall object

Summary for all segments in the file

positive_countinteger

negative_countinteger

neutral_countinteger

speakers object[]

An array of objects that represent sentiment data for a specific speaker.

Array [

speakerstring

positive_countinteger

negative_countinteger

neutral_countinteger

]

channels object[]

An array of objects that represent sentiment data for a specific channel.

Array [

channelstring

positive_countinteger

negative_countinteger

neutral_countinteger

]

topics object

Main object that holds topic detection results.

segments object[]

An array of objects that represent a segment of text and its associated topic information.

Array [

textstring

start_timefloat

end_timefloat

topics object[]

Array [

topicstring

]

summary object

An object that holds overall information on the topics detected.

overall object

Summary of overall topic detection results.

[property name: string]integer

chapters object[]

An array of objects that represent summarized chapters of the transcript

Array [

titlestring

The auto-generated title for the chapter

summarystring

An auto-generated paragraph-style, short summary of the chapter

start_timenumber

The start time of the chapter in the audio file

end_timenumber

The end time of the chapter in the audio file

]

audio_events object[]

Timestamped audio events, only set if audio_events_config is in the config

Array [

typestring

Kind of audio event. E.g. music

start_timefloat

Time (in seconds) at which the audio event starts

end_timefloat

Time (in seconds) at which the audio event ends

confidencefloat

Prediction confidence associated with this event

channelstring

Input channel this event occurred on

]

audio_event_summary object

Summary statistics per event type, keyed by type, e.g. music

overall object

Overall summary on all channels

[property name: string] object

Summary statistics for this audio event type

total_durationfloat

Total duration (in seconds) of all audio events of this type

countnumber

Number of events of this type

channels object

Summary keyed by channel, only set if channel diarization is enabled

[property name: string] object

Summary statistics for this audio event type

total_durationfloat

Total duration (in seconds) of all audio events of this type

countnumber

Number of events of this type

Example response

The following is an example of a transcript response, which you should see as an output of the provided example.wav file used in the code samples above.

{
  "format": "2.9",
  "job": {
    "created_at": "2025-06-30T11:43:54.135Z",
    "data_name": "example.wav",
    "duration": 15,
    "id": "650krlru2e"
  },
  "metadata": {
    "created_at": "2025-06-30T11:44:08.133526Z",
    "language_pack_info": {
      "adapted": false,
      "itn": true,
      "language_description": "English",
      "word_delimiter": " ",
      "writing_direction": "left-to-right"
    },
    "orchestrator_version": "2025.06.28651+1eb4127132.HEAD",
    "transcription_config": {
      "language": "en",
      "operating_point": "enhanced"
    },
    "type": "transcription"
  },
  "results": [
    {
      "alternatives": [
        {
          "confidence": 1,
          "content": "Welcome",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "end_time": 1.36,
      "start_time": 0.72,
      "type": "word"
    },
    {
      "alternatives": [
        {
          "confidence": 1,
          "content": "to",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "end_time": 1.44,
      "start_time": 1.36,
      "type": "word"
    },
    {
      "alternatives": [
        {
          "confidence": 0.98,
          "content": "Speechmatics",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "end_time": 2.44,
      "start_time": 1.48,
      "type": "word"
    },
    {
      "alternatives": [
        {
          "confidence": 1,
          "content": ".",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "attaches_to": "previous",
      "end_time": 2.44,
      "is_eos": true,
      "start_time": 2.44,
      "type": "punctuation"
    },
    {
      "alternatives": [
        {
          "confidence": 1,
          "content": "We're",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "end_time": 3.24,
      "start_time": 3.04,
      "type": "word"
    },
    {
      "alternatives": [
        {
          "confidence": 1,
          "content": "delighted",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "end_time": 3.64,
      "start_time": 3.24,
      "type": "word"
    },
    {
      "alternatives": [
        {
          "confidence": 1,
          "content": "that",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "end_time": 3.8,
      "start_time": 3.64,
      "type": "word"
    },
    {
      "alternatives": [
        {
          "confidence": 1,
          "content": "you've",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "end_time": 4.04,
      "start_time": 3.8,
      "type": "word"
    },
    {
      "alternatives": [
        {
          "confidence": 1,
          "content": "decided",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "end_time": 4.44,
      "start_time": 4.04,
      "type": "word"
    },
    {
      "alternatives": [
        {
          "confidence": 1,
          "content": "to",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "end_time": 4.56,
      "start_time": 4.44,
      "type": "word"
    },
    {
      "alternatives": [
        {
          "confidence": 1,
          "content": "try",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "end_time": 4.92,
      "start_time": 4.56,
      "type": "word"
    },
    {
      "alternatives": [
        {
          "confidence": 1,
          "content": "our",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "end_time": 5.2,
      "start_time": 4.92,
      "type": "word"
    },
    {
      "alternatives": [
        {
          "confidence": 1,
          "content": "speech",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "end_time": 5.48,
      "start_time": 5.2,
      "type": "word"
    },
    {
      "alternatives": [
        {
          "confidence": 1,
          "content": "to",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "end_time": 5.6,
      "start_time": 5.48,
      "type": "word"
    },
    {
      "alternatives": [
        {
          "confidence": 1,
          "content": "text",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "end_time": 5.92,
      "start_time": 5.6,
      "type": "word"
    },
    {
      "alternatives": [
        {
          "confidence": 1,
          "content": "software",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "end_time": 6.56,
      "start_time": 5.92,
      "type": "word"
    },
    {
      "alternatives": [
        {
          "confidence": 1,
          "content": "to",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "end_time": 7.16,
      "start_time": 7.04,
      "type": "word"
    },
    {
      "alternatives": [
        {
          "confidence": 1,
          "content": "get",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "end_time": 7.4,
      "start_time": 7.2,
      "type": "word"
    },
    {
      "alternatives": [
        {
          "confidence": 1,
          "content": "going",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "end_time": 7.88,
      "start_time": 7.4,
      "type": "word"
    },
    {
      "alternatives": [
        {
          "confidence": 1,
          "content": ".",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "attaches_to": "previous",
      "end_time": 7.88,
      "is_eos": true,
      "start_time": 7.88,
      "type": "punctuation"
    },
    {
      "alternatives": [
        {
          "confidence": 1,
          "content": "Just",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "end_time": 8.12,
      "start_time": 7.92,
      "type": "word"
    },
    {
      "alternatives": [
        {
          "confidence": 1,
          "content": "create",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "end_time": 8.44,
      "start_time": 8.12,
      "type": "word"
    },
    {
      "alternatives": [
        {
          "confidence": 1,
          "content": "an",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "end_time": 8.64,
      "start_time": 8.48,
      "type": "word"
    },
    {
      "alternatives": [
        {
          "confidence": 1,
          "content": "API",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "end_time": 9,
      "start_time": 8.64,
      "type": "word"
    },
    {
      "alternatives": [
        {
          "confidence": 1,
          "content": "key",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "end_time": 9.36,
      "start_time": 9,
      "type": "word"
    },
    {
      "alternatives": [
        {
          "confidence": 1,
          "content": "and",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "end_time": 9.6,
      "start_time": 9.36,
      "type": "word"
    },
    {
      "alternatives": [
        {
          "confidence": 1,
          "content": "submit",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "end_time": 9.88,
      "start_time": 9.6,
      "type": "word"
    },
    {
      "alternatives": [
        {
          "confidence": 1,
          "content": "a",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "end_time": 10.04,
      "start_time": 9.92,
      "type": "word"
    },
    {
      "alternatives": [
        {
          "confidence": 1,
          "content": "transcription",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "end_time": 10.6,
      "start_time": 10.04,
      "type": "word"
    },
    {
      "alternatives": [
        {
          "confidence": 1,
          "content": "request",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "end_time": 11.04,
      "start_time": 10.6,
      "type": "word"
    },
    {
      "alternatives": [
        {
          "confidence": 1,
          "content": "to",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "end_time": 11.12,
      "start_time": 11.04,
      "type": "word"
    },
    {
      "alternatives": [
        {
          "confidence": 1,
          "content": "our",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "end_time": 11.32,
      "start_time": 11.16,
      "type": "word"
    },
    {
      "alternatives": [
        {
          "confidence": 1,
          "content": "API",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "end_time": 11.88,
      "start_time": 11.32,
      "type": "word"
    },
    {
      "alternatives": [
        {
          "confidence": 1,
          "content": ".",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "attaches_to": "previous",
      "end_time": 11.88,
      "is_eos": true,
      "start_time": 11.88,
      "type": "punctuation"
    },
    {
      "alternatives": [
        {
          "confidence": 1,
          "content": "We",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "end_time": 12.36,
      "start_time": 12.24,
      "type": "word"
    },
    {
      "alternatives": [
        {
          "confidence": 1,
          "content": "hope",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "end_time": 12.56,
      "start_time": 12.4,
      "type": "word"
    },
    {
      "alternatives": [
        {
          "confidence": 1,
          "content": "you'll",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "end_time": 12.8,
      "start_time": 12.56,
      "type": "word"
    },
    {
      "alternatives": [
        {
          "confidence": 1,
          "content": "be",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "end_time": 12.88,
      "start_time": 12.8,
      "type": "word"
    },
    {
      "alternatives": [
        {
          "confidence": 1,
          "content": "very",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "end_time": 13.08,
      "start_time": 12.88,
      "type": "word"
    },
    {
      "alternatives": [
        {
          "confidence": 1,
          "content": "impressed",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "end_time": 13.4,
      "start_time": 13.08,
      "type": "word"
    },
    {
      "alternatives": [
        {
          "confidence": 1,
          "content": "by",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "end_time": 13.56,
      "start_time": 13.4,
      "type": "word"
    },
    {
      "alternatives": [
        {
          "confidence": 1,
          "content": "the",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "end_time": 13.68,
      "start_time": 13.56,
      "type": "word"
    },
    {
      "alternatives": [
        {
          "confidence": 1,
          "content": "results",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "end_time": 14.36,
      "start_time": 13.68,
      "type": "word"
    },
    {
      "alternatives": [
        {
          "confidence": 1,
          "content": ".",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "attaches_to": "previous",
      "end_time": 14.36,
      "is_eos": true,
      "start_time": 14.36,
      "type": "punctuation"
    },
    {
      "alternatives": [
        {
          "confidence": 1,
          "content": "Thank",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "end_time": 15.08,
      "start_time": 14.8,
      "type": "word"
    },
    {
      "alternatives": [
        {
          "confidence": 1,
          "content": "you",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "end_time": 15.4,
      "start_time": 15.08,
      "type": "word"
    },
    {
      "alternatives": [
        {
          "confidence": 1,
          "content": ".",
          "language": "en",
          "speaker": "UU"
        }
      ],
      "attaches_to": "previous",
      "end_time": 15.4,
      "is_eos": true,
      "start_time": 15.4,
      "type": "punctuation"
    }
  ]
}

Quicklinks

Using the Batch API​

1. Create an API key​

2. Pick and install a library​

3. Grab a sample file​

4. Insert API key​

Transcript response schema​

Example response​

Quicklinks​

Troubleshooting

API Reference

Inputs

Limits

Using the Batch API

1. Create an API key

2. Pick and install a library

3. Grab a sample file

4. Insert API key

Transcript response schema

Example response

Quicklinks