Skip to main content

Speechmatics ASR REST API (2.0.0)

Download OpenAPI specification:Download

The Speechmatics Automatic Speech Recognition REST API is used to submit ASR jobs and receive the results. The supported job type is transcription of audio files.

Jobs

Create a new job.

header Parameters
Authorization
required
string

Customer API token

X-SM-EAR-Tag
string

Early Access Release Tag

Request Body schema: multipart/form-data
config
required
string

JSON containing a JobConfig model indicating the type and parameters for the recognition job.

data_file
string <binary>

The data file to be processed. Alternatively the data file can be fetched from a url specified in JobConfig.

text_file
string <binary>

For alignment jobs, the text file that the data file should be aligned to.

Responses

Response Schema:
id
required
string

The unique ID assigned to the job. Keep a record of this for later retrieval of your completed job.

Request samples

from speechmatics.batch_client import BatchClient

# Open the client using a context manager
with BatchClient("YOUR_API_KEY") as client:
    job_id = client.submit_job(
        audio="PATH_TO_FILE",
    )
    print(job_id)

Response samples

Content type
{
  • "id": "a1b2c3d4e5"
}

List all jobs.

query Parameters
created_before
string <date-time>

UTC Timestamp cursor for paginating request response. Filters jobs based on creation time to the nearest millisecond. Accepts up to nanosecond precision, truncating to millisecond precision. By default, the response will start with the most recent job.

limit
integer [ 1 .. 100 ]

Limit for paginating the request response. Defaults to 100.

include_deleted
boolean

Specifies whether deleted jobs should be included in the response. Defaults to false.

header Parameters
Authorization
required
string

Customer API token

X-SM-EAR-Tag
string

Early Access Release Tag

Responses

Response Schema:
required
Array of objects (JobDetails)
Array
created_at
required
string <date-time>
Example: "2018-01-09T12:29:01.853047Z"

The UTC date time the job was created.

data_name
required
string

Name of the data file submitted for job.

text_name
string

Name of the text file submitted to be aligned to audio.

duration
integer >= 0

The file duration (in seconds). May be missing for fetch URL jobs.

id
required
string
Example: "a1b2c3d4e5"

The unique id assigned to the job.

status
required
string
Enum: "running" "done" "rejected" "deleted" "expired"

The status of the job. * running - The job is actively running. * done - The job completed successfully. * rejected - The job was accepted at first, but later could not be processed by the transcriber. * deleted - The user deleted the job. * expired - The system deleted the job. Usually because the job was in the done state for a very long time.

object (JobConfig)

JSON object that contains various groups of job configuration parameters. Based on the value of type, a type-specific object such as transcription_config is required to be present to specify all configuration settings or parameters needed to process the job inputs as expected.

If the results of the job are to be forwarded on completion, notification_config can be provided with a list of callbacks to be made; no assumptions should be made about the order in which they will occur.

Customer specific job details or metadata can be supplied in tracking, and this information will be available where possible in the job results and in callbacks.

lang
string

Optional parameter used for backwards compatibility with v1 api

Array of objects (JobDetailError)

Optional list of errors that have occurred in user interaction, for example: audio could not be fetched or notification could not be sent.

Request samples

from speechmatics.batch_client import BatchClient

with BatchClient("YOUR_API_KEY") as client:
    jobs_list = client.list_jobs()

    # Here, we get and print out the name 
    # of the first job if it exists
    if len(jobs_list):
      first_job_name = jobs_list["jobs"][0]["data_name"]
      print(first_job_name)

Response samples

Content type
{
  • "jobs": [
    ]
}

Get job details, including progress and any error reports.

path Parameters
jobid
required
string
Example: a1b2c3d4e5

ID of the job.

header Parameters
Authorization
required
string

Customer API token

X-SM-EAR-Tag
string

Early Access Release Tag

Responses

Response Schema:
required
object (JobDetails)

Document describing a job. JobConfig will be present in JobDetails returned for GET jobs/ request in SaaS and in Batch Appliance, but it will not be present in JobDetails returned as item in RetrieveJobsResponse in case of Batch Appliance.

created_at
required
string <date-time>
Example: "2018-01-09T12:29:01.853047Z"

The UTC date time the job was created.

data_name
required
string

Name of the data file submitted for job.

text_name
string

Name of the text file submitted to be aligned to audio.

duration
integer >= 0

The file duration (in seconds). May be missing for fetch URL jobs.

id
required
string
Example: "a1b2c3d4e5"

The unique id assigned to the job.

status
required
string
Enum: "running" "done" "rejected" "deleted" "expired"

The status of the job. * running - The job is actively running. * done - The job completed successfully. * rejected - The job was accepted at first, but later could not be processed by the transcriber. * deleted - The user deleted the job. * expired - The system deleted the job. Usually because the job was in the done state for a very long time.

object (JobConfig)

JSON object that contains various groups of job configuration parameters. Based on the value of type, a type-specific object such as transcription_config is required to be present to specify all configuration settings or parameters needed to process the job inputs as expected.

If the results of the job are to be forwarded on completion, notification_config can be provided with a list of callbacks to be made; no assumptions should be made about the order in which they will occur.

Customer specific job details or metadata can be supplied in tracking, and this information will be available where possible in the job results and in callbacks.

lang
string

Optional parameter used for backwards compatibility with v1 api

Array of objects (JobDetailError)

Optional list of errors that have occurred in user interaction, for example: audio could not be fetched or notification could not be sent.

Request samples

from speechmatics.batch_client import BatchClient

# This example shows how to check the duration of the file
with BatchClient("YOUR_API_KEY") as client:
    job_response = client.check_job_status("YOUR_JOB_ID")

    job_duration = job_response["job"]["duration"]
    print(job_duration)

Response samples

Content type
{
  • "job": {
    }
}

Delete a job and remove all associated resources.

path Parameters
jobid
required
string
Example: a1b2c3d4e5

ID of the job to delete.

query Parameters
force
boolean

When set, a running job will be force terminated. When unset (default), a running job will not be terminated and request will return HTTP 423 Locked.

header Parameters
Authorization
required
string

Customer API token

X-SM-EAR-Tag
string

Early Access Release Tag

Responses

Response Schema:
required
object (JobDetails)

Document describing a job. JobConfig will be present in JobDetails returned for GET jobs/ request in SaaS and in Batch Appliance, but it will not be present in JobDetails returned as item in RetrieveJobsResponse in case of Batch Appliance.

created_at
required
string <date-time>
Example: "2018-01-09T12:29:01.853047Z"

The UTC date time the job was created.

data_name
required
string

Name of the data file submitted for job.

text_name
string

Name of the text file submitted to be aligned to audio.

duration
integer >= 0

The file duration (in seconds). May be missing for fetch URL jobs.

id
required
string
Example: "a1b2c3d4e5"

The unique id assigned to the job.

status
required
string
Enum: "running" "done" "rejected" "deleted" "expired"

The status of the job. * running - The job is actively running. * done - The job completed successfully. * rejected - The job was accepted at first, but later could not be processed by the transcriber. * deleted - The user deleted the job. * expired - The system deleted the job. Usually because the job was in the done state for a very long time.

object (JobConfig)

JSON object that contains various groups of job configuration parameters. Based on the value of type, a type-specific object such as transcription_config is required to be present to specify all configuration settings or parameters needed to process the job inputs as expected.

If the results of the job are to be forwarded on completion, notification_config can be provided with a list of callbacks to be made; no assumptions should be made about the order in which they will occur.

Customer specific job details or metadata can be supplied in tracking, and this information will be available where possible in the job results and in callbacks.

lang
string

Optional parameter used for backwards compatibility with v1 api

Array of objects (JobDetailError)

Optional list of errors that have occurred in user interaction, for example: audio could not be fetched or notification could not be sent.

Request samples

from speechmatics.batch_client import BatchClient

with BatchClient("YOUR_API_KEY") as client:
    client.delete_job("YOUR_JOB_ID")

Response samples

Content type
{
  • "job": {
    }
}

Get the transcript for a transcription job.

path Parameters
jobid
required
string
Example: a1b2c3d4e5

ID of the job.

query Parameters
format
string
Enum: "json-v2" "txt" "srt"

The transcription format (by default the json-v2 format is returned).

header Parameters
Authorization
required
string

Customer API token

X-SM-EAR-Tag
string

Early Access Release Tag

Responses

Response Schema:
format
required
string
Example: "2.1"

Speechmatics JSON transcript format version number.

required
object (JobInfo)

Summary information about an ASR job, to support identification and tracking.

required
object (RecognitionMetadata)

Summary information about the output from an ASR job, comprising the job type and configuration parameters used when generating the output.

required
Array of objects (RecognitionResult)
Example: [[{"channel":"channel_1","start_time":0.55,"end_time":1.2,"type":"word","volume":0.5,"alternatives":[{"confidence":0.95,"content":"Hello","language":"en","speaker":"S1","display":{"direction":"ltr"}}]}]]
object
Example: {"de":[{"start_time":0.5,"end_time":1.3,"content":"Guten Tag, wie geht es dir?","speaker":"UU"}],"fr":[{"start_time":0.5,"end_time":1.3,"content":"Bonjour, comment ça va?","speaker":"UU"}]}

Translations of the transcript into other languages. It is a map of ISO language codes to arrays of translated sentences. Configured using translation_config.

object (SummarizationResult)
Example: {"content":"this is a summary"}

Summary of the transcript, configured using summarization_config.

object (SentimentAnalysisResult)
Example: {"segments":[{"text":"I am happy with the product.","start_time":0,"end_time":5,"sentiment":"positive","speaker":"John Doe","channel":"Chat","confidence":0.9},{"text":"I don't like the customer service.","start_time":6,"end_time":12,"sentiment":"negative","speaker":"John Doe","channel":"Chat","confidence":0.8}],"summary":{"overall":{"positive_count":1,"negative_count":1,"neutral_count":0},"speakers":[{"speaker":"John Doe","positive_count":1,"negative_count":1,"neutral_count":0}],"channels":[{"channel":"Chat","positive_count":1,"negative_count":1,"neutral_count":0}]}}

The main object that holds sentiment analysis data.

object (TopicDetectionResult)
Example: {"segments":[{"text":"I am happy with the product.","start_time":0,"end_time":5,"topics":[{"topic":"product"}]},{"text":"We will deploy this container for Spanish.","start_time":6,"end_time":12,"topics":[{"topic":"deployment"},{"topic":"languages"}]}],"summary":{"overall":{"deployment":1,"languages":1,"product":1}}}

Main object that holds topic detection results.

Array of objects (AutoChaptersResult)
Example: [{"title":"Part 1","summary":"Summary of part 1","start_time":0,"end_time":5},{"title":"Part 2","summary":"Summary of part 2","start_time":5,"end_time":10}]

An array of objects that represent summarized chapters of the transcript

Array of objects (AudioEventItem)

Timestamped audio events, only set if audio_events_config is in the config

object

Summary statistics per event type, keyed by type, e.g. music

Request samples

from speechmatics.batch_client import BatchClient

# This examples shows how to unpack various things from the transcript
with BatchClient("YOUR_API_KEY") as client:
    transcript = client.get_job_result("YOUR_JOB_ID")

    # Print out the first word of the transcript
    first_word = transcript["results"][0][0]["alternatives"][0]
    print(first_word)

    # Supposing we had submitted a translation, we might get the first sentence
    translation_sentence = transcript["translations"]["de"][0][content]
    print(translation_sentence)

    # If we wanted a summary
    summary = transcript["summary"]["content"]
    print(summary)

    # If we wanted to check for sentiment analysis
    first_sentiment = transcript["sentiment_analysis"]["segments"][0]["sentiment"]
    print(first_sentiment)

Response samples

Content type
{
  • "format": "2.1",
  • "job": {
    },
  • "metadata": {
    },
  • "results": [
    ],
  • "translations": {
    },
  • "summary": {
    },
  • "sentiment_analysis": {
    },
  • "topics": {
    },
  • "chapters": [
    ],
  • "audio_events": [
    ],
  • "audio_event_summary": {
    }
}

Get the aligned text file for an alignment job.

path Parameters
jobid
required
string
Example: a1b2c3d4e5

ID of the job.

query Parameters
tags
string
Enum: "word_start_and_end" "one_per_line"

Control how timing information is added to the text file provided as input to the alignment job. If set to word_start_and_end, SGML tags are inserted at the start and end of each word, for example <time=0.41>. If set to one_per_line square bracket tags are inserted at the start of each line, for example [00:00:00.4] . The default is word_start_and_end.

header Parameters
Authorization
required
string

Customer API token

X-SM-EAR-Tag
string

Early Access Release Tag

Responses

Response Schema:
string <binary>

Request samples

JOB_ID="YOUR_JOB_ID"
API_KEY="YOUR_API_KEY"

curl -L -X GET "https://asr.api.speechmatics.com/v2/jobs/${JOB_ID}/alignment" \
    -H "Authorization: Bearer ${API_KEY}"

Response samples

Content type
No sample

Get the usage statistics.

query Parameters
since
string <date>

Include usage after the given date (inclusive). This is a ISO-8601 calendar date format: YYYY-MM-DD.

until
string <date>

Include usage before the given date (inclusive). This is a ISO-8601 calendar date format: YYYY-MM-DD.

header Parameters
Authorization
required
string

Customer API token

X-SM-EAR-Tag
string

Early Access Release Tag

Responses

Response Schema: application/json
since
required
string <date-time>
Example: "2021-10-14T00:55:00Z"
until
required
string <date-time>
Example: "2022-12-01T00:00:00Z"
required
Array of objects (UsageDetails)
required
Array of objects (UsageDetails)

Request samples

API_KEY="YOUR_API_KEY"

curl -L -X GET "https://asr.api.speechmatics.com/v2/usage" \
    -H "Authorization: Bearer ${API_KEY}"

Response samples

Content type
application/json
{
  • "since": "2021-09-12T00:00:00Z",
  • "until": "2022-01-01T23:59:59Z",
  • "summary": [
    ],
  • "details": [
    ]
}

Job Config

This model should be used when you create a new job. It will also be returned as a part of response in a number of requests. This includes when you get job details or get the transcript for a transcription job.

Based on the value of type, a type-specific object such as transcription_config is required to be present to specify all configuration settings or parameters needed to process the job inputs as expected.

If the results of the job are to be forwarded on completion, notification_config can be provided with a list of callbacks to be made; no assumptions should be made about the order in which they will occur. For more details, please refer to Notifications in the documentation.

Customer specific job details or metadata can be supplied in tracking, and this information will be available where possible in the job results and in callbacks.

type
required
string (JobType)
Enum: "alignment" "transcription"
object (DataFetchConfig)
object (DataFetchConfig)
object (AlignmentConfig)
Example: {"language":"en"}
object (TranscriptionConfig)
Example: {"language":"en","output_locale":"en-GB","additional_vocab":[{"content":"Speechmatics","sounds_like":["speechmatics"]},{"content":"gnocchi","sounds_like":["nyohki","nokey","nochi"]},{"content":"CEO","sounds_like":["C.E.O."]},{"content":"financial crisis"}],"diarization":"channel","channel_diarization_labels":["Caller","Agent"]}
Array of objects (NotificationConfig)
Example: [[{"url":"https://collector.example.org/callback","contents":["transcript:json-v2"],"auth_headers":["Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJ1c2VySWQiOiJiMDhmODZhZi0zNWRhLTQ4ZjItOGZhYi1jZWYzOTA0NjYwYmQifQ.-xN_h82PHVTCMA9vdoHrcZxH-x5mb11y1537t3rGzcM"]}]]
object (TrackingData)
Example: {"title":"ACME Q12018 Earnings Call","reference":"/data/clients/ACME/statements/segs/2018Q1-seg8","tags":["quick-review","segment"],"details":{"client":"ACME Corp","segment":8,"seg_start":963.201,"seg_end":1091.481}}
object (OutputConfig)
object (TranslationConfig)
object (LanguageIdentificationConfig)
object (SummarizationConfig)
sentiment_analysis_config
object (SentimentAnalysisConfig)
object (TopicDetectionConfig)
auto_chapters_config
object (AutoChaptersConfig)
object (AudioEventsConfig)
{
  • "type": "alignment",
  • "fetch_data": {
    },
  • "fetch_text": {
    },
  • "alignment_config": {
    },
  • "transcription_config": {
    },
  • "notification_config": [
    ],
  • "tracking": {
    },
  • "output_config": {
    },
  • "translation_config": {
    },
  • "language_identification_config": {
    },
  • "summarization_config": {
    },
  • "sentiment_analysis_config": { },
  • "topic_detection_config": {
    },
  • "auto_chapters_config": { },
  • "audio_events_config": {
    }
}

Job Details

Returned when you get job details, list all jobs or delete a job. This model includes the status and config that was used.

created_at
required
string <date-time>
Example: "2018-01-09T12:29:01.853047Z"

The UTC date time the job was created.

data_name
required
string

Name of the data file submitted for job.

text_name
string

Name of the text file submitted to be aligned to audio.

duration
integer >= 0

The file duration (in seconds). May be missing for fetch URL jobs.

id
required
string
Example: "a1b2c3d4e5"

The unique id assigned to the job.

status
required
string
Enum: "running" "done" "rejected" "deleted" "expired"

The status of the job. * running - The job is actively running. * done - The job completed successfully. * rejected - The job was accepted at first, but later could not be processed by the transcriber. * deleted - The user deleted the job. * expired - The system deleted the job. Usually because the job was in the done state for a very long time.

object (JobConfig)

JSON object that contains various groups of job configuration parameters. Based on the value of type, a type-specific object such as transcription_config is required to be present to specify all configuration settings or parameters needed to process the job inputs as expected.

If the results of the job are to be forwarded on completion, notification_config can be provided with a list of callbacks to be made; no assumptions should be made about the order in which they will occur.

Customer specific job details or metadata can be supplied in tracking, and this information will be available where possible in the job results and in callbacks.

lang
string

Optional parameter used for backwards compatibility with v1 api

Array of objects (JobDetailError)

Optional list of errors that have occurred in user interaction, for example: audio could not be fetched or notification could not be sent.

{
  • "created_at": "2018-01-09T12:29:01.853047Z",
  • "data_name": "string",
  • "text_name": "string",
  • "duration": 0,
  • "id": "a1b2c3d4e5",
  • "status": "running",
  • "config": {
    },
  • "lang": "string",
  • "errors": [
    ]
}

Transcript

Returned when you get the transcript for a transcription job. It includes metadata about the job, such as the transcription config that was used.

format
required
string
Example: "2.1"

Speechmatics JSON transcript format version number.

required
object (JobInfo)

Summary information about an ASR job, to support identification and tracking.

required
object (RecognitionMetadata)

Summary information about the output from an ASR job, comprising the job type and configuration parameters used when generating the output.

required
Array of objects (RecognitionResult)
Example: [[{"channel":"channel_1","start_time":0.55,"end_time":1.2,"type":"word","volume":0.5,"alternatives":[{"confidence":0.95,"content":"Hello","language":"en","speaker":"S1","display":{"direction":"ltr"}}]}]]
object
Example: {"de":[{"start_time":0.5,"end_time":1.3,"content":"Guten Tag, wie geht es dir?","speaker":"UU"}],"fr":[{"start_time":0.5,"end_time":1.3,"content":"Bonjour, comment ça va?","speaker":"UU"}]}

Translations of the transcript into other languages. It is a map of ISO language codes to arrays of translated sentences. Configured using translation_config.

object (SummarizationResult)
Example: {"content":"this is a summary"}

Summary of the transcript, configured using summarization_config.

object (SentimentAnalysisResult)
Example: {"segments":[{"text":"I am happy with the product.","start_time":0,"end_time":5,"sentiment":"positive","speaker":"John Doe","channel":"Chat","confidence":0.9},{"text":"I don't like the customer service.","start_time":6,"end_time":12,"sentiment":"negative","speaker":"John Doe","channel":"Chat","confidence":0.8}],"summary":{"overall":{"positive_count":1,"negative_count":1,"neutral_count":0},"speakers":[{"speaker":"John Doe","positive_count":1,"negative_count":1,"neutral_count":0}],"channels":[{"channel":"Chat","positive_count":1,"negative_count":1,"neutral_count":0}]}}

The main object that holds sentiment analysis data.

object (TopicDetectionResult)
Example: {"segments":[{"text":"I am happy with the product.","start_time":0,"end_time":5,"topics":[{"topic":"product"}]},{"text":"We will deploy this container for Spanish.","start_time":6,"end_time":12,"topics":[{"topic":"deployment"},{"topic":"languages"}]}],"summary":{"overall":{"deployment":1,"languages":1,"product":1}}}

Main object that holds topic detection results.

Array of objects (AutoChaptersResult)
Example: [{"title":"Part 1","summary":"Summary of part 1","start_time":0,"end_time":5},{"title":"Part 2","summary":"Summary of part 2","start_time":5,"end_time":10}]

An array of objects that represent summarized chapters of the transcript

Array of objects (AudioEventItem)

Timestamped audio events, only set if audio_events_config is in the config

object

Summary statistics per event type, keyed by type, e.g. music

{
  • "format": "2.1",
  • "job": {
    },
  • "metadata": {
    },
  • "results": [
    ],
  • "translations": {
    },
  • "summary": {
    },
  • "sentiment_analysis": {
    },
  • "topics": {
    },
  • "chapters": [
    ],
  • "audio_events": [
    ],
  • "audio_event_summary": {
    }
}