Skip to main content

Summaries

Transcription:BatchDeployments:SaaS

Speechmatics enables you to generate a concise summary from your audio. With just a single API call, you can quickly transcribe and summarize content, making content review simpler and more efficient.

If you're new to Speechmatics, please see our guide on Transcribing a File. Once you are set up, include the following config to enable Summaries:

{
  "type": "transcription",
  "transcription_config": {
    "language": "en"
  },
  "summarization_config": {}  # You can also configure the summary. See below for more detail.
}

Quick start

Python client example to summarize a file for Batch with the default parameters.

1from speechmatics.models import ConnectionSettings
2from speechmatics.batch_client import BatchClient
3from httpx import HTTPStatusError
4
5API_KEY = "YOUR_API_KEY"
6PATH_TO_FILE = "example.wav"
7LANGUAGE = "en" # Transcription language
8
9settings = ConnectionSettings(
10    url="https://asr.api.speechmatics.com/v2",
11    auth_token=API_KEY,
12)
13
14# Define transcription parameters
15conf = {
16    "type": "transcription",
17    "transcription_config": {
18        "language": LANGUAGE
19    },
20    "summarization_config": {}  # You can also configure the summary. See below for more detail.
21}
22
23# Open the client using a context manager
24with BatchClient(settings) as client:
25    try:
26        job_id = client.submit_job(
27            audio=PATH_TO_FILE,
28            transcription_config=conf,
29        )
30        print(f'job {job_id} submitted successfully, waiting for transcript')
31
32        # Note that in production, you should set up notifications instead of polling.
33        # Notifications are described here: https://docs.speechmatics.com/features-other/notifications
34        transcript = client.wait_for_completion(job_id, transcription_format='json-v2')
35        summary = transcript["summary"]["content"]
36        print(summary) # print the returned summary
37    except HTTPStatusError as e:
38        if e.response.status_code == 401:
39            print('Invalid API key - Check your API_KEY at the top of the code!')
40        elif e.response.status_code == 400:
41            print(e.response.json()['detail'])
42        else:
43            raise e
44

Example Response

The summary is only present in the JSON output.

{
  "job": { ... },
  "metadata": {
    "created_at": "2023-05-26T15:01:48.412714Z",
    "type": "transcription",
    "transcription_config": {
      "language": "en"
    },
    "summarization_config": {}
    ...
  },
  "results": [...],
  "summary": {
    "content": "Laura Perez called to seek assistance in completing her booking through the mobile application..."
  }
}

Configuration Options

Example with configuration parameters set:

{
  "type": "transcription",
  "transcription_config": {
    "language": "en"
  },
  "summarization_config": {
    "content_type": "informative",
    "summary_length": "detailed",
    "summary_type": "paragraphs"
  }
}

Use the below configuration options to adjust the format of the returned summary.

Configuration nameDescriptionDefault Value
content_typeChoose from three options:

conversational - Best suited for dialogues involving multiple participants, such as calls, meetings or discussions. It focuses on summarizing key points of the conversation.

informative - Recommended for more structured information delivered by one or more people, making it ideal for videos, podcasts, lectures, and presentations.

auto - Automatically selects the most appropriate content type based on an analysis of the transcript.
auto
summary_lengthDetermines the depth of the summary:

brief - Provides a succinct summary, condensing the content into just a few sentences.

detailed - Provide a longer, structured summary. For conversational content, it includes key topics and a summary of the entire conversation. For informative content, it logically divides the audio into sections and provides a summary for each.
brief
summary_typeDetermines the formatting style of the summary:

bullets - Presents the summary with bullets.

paragraphs - Presents the summary with paragraphs.
bullets

Example Summary Outputs

Webinar - Informative

A summary of our Ursa release webinar. Watch on YouTube.

Brief:

- Speechmatics introduced new speech recognition models named Ursa that are optimized for GPUs and are the most accurate ASR on the planet with a 35% improvement for standard and a 22% improvement for enhanced. The new models are also three times more efficient than CPU-based models.
- The company has released a new version of its speech recognition software that includes improvements to speaker diarization, batch processing, and readability. The new version also includes automatic language identification and translation capabilities for 34 languages to and from English, with the translation feature outperforming Google's translation service.
- The speaker demonstrates how to use Speechmatics' transcription and translation services through their portal or API, including real-time transcription that can be customized with speaker labels and custom words. The speaker also answers questions about the accuracy of the system, maximum file size for transcription, and support for multiple languages in real-time streams.

Detailed:

  1. Speechmatics' ASR Revolution
    - Speechmatics introduced new models named Ursa that focus on accuracy, speed, and efficiency.
    - The new models are optimized for GPUs, which are more powerful and efficient for processing.
    - Ursa has been benchmarked against competitors and found to be the most accurate ASR on the planet, with a 35% improvement for standard and a 22% improvement for enhanced.
    - The models perform better with accents and noise and are three times more efficient than CPU-based models.
    - The latency for real-time transcription has also been improved, with the new models showing little difference in accuracy even at small latencies.
    - Beta customers have used the new models for live captioning at a major car racing event.

  2. Improvements to Speech Recognition Software
    - Speechmatics released a new version of its speech recognition software that includes improvements to speaker diarization, batch processing, and readability.
    - The software also includes automatic language identification and translation capabilities, with a single API call allowing translation to and from 34 languages.
    - The company claims its transcription accuracy is the best available, and that its translation accuracy is ahead of Google's.
    - The new version is available to enterprise customers and will be rolled out to other languages later this year.

  3. Using Speechmatics' Services
    - The speaker demonstrates how to use Speechmatics' transcription and translation services through their portal or API.
    - The user can select the language they want to translate their audio file to and choose between enhanced or standard translation speed.
    - The portal also allows users to view previous jobs and real-time transcription with the option to add custom words and reduce latency.
    - The speaker also addresses questions about the maximum file size for transcription, the accuracy of the system, and the support for speaker labels and multiple languages in real-time streams.
    - The speaker concludes by encouraging users to try out Speechmatics' services and visit their website for more information.

Interview - Informative

A summary of a LinkedIn interview live "Adding Value for Contact Center Solutions Using Speech to Text". Watch on YouTube.

Brief:

- Prosodica uses Speechmatics' transcription engine to provide insights and predictions based on customer conversations in contact center analytics.
- The accuracy of transcription is crucial for machine learning algorithms to make accurate predictions, and having a diverse and inclusive voice model is important to avoid disparate impacts.
- The future of speech recognition in the contact center space is evolving towards a hybrid workforce, where businesses need to be remote-ready to support both customers and employees.

Detailed:

  1. Importance of accurate transcription in contact center analytics
    - Speech-to-text technology used to provide insights and predictions based on customer conversations
    - Accuracy of transcription crucial for machine learning algorithms to make accurate predictions
    - Small improvements in transcription accuracy can have a significant impact on machine learning accuracy

  2. Importance of diverse and inclusive voice model
    - Multinational corporations send voice traffic to call centers all over the world
    - More diverse set of voices used in training, wider the net for what's typical and same level of recognition accuracy can be achieved
    - Non-native English speaking voices, female voices, and younger voices are all areas where many transcription engines struggle

  3. Disparate impact of transcription accuracy differences
    - If transcription is less accurate for a particular voice, then the prediction is less accurate for that voice, creating a disparate impact that businesses do not want

  4. Future of speech recognition in contact center space
    - Evolving towards a hybrid workforce, where businesses need to be remote-ready to support both customers and employees
    - Automation is a trend, but there are certain interactions that people will not be comfortable dealing with an automated bot
    - Sarcasm is a tricky problem, but speech and voice analytics used as data point to detect it.

Technical Support Call - Conversational

A summary of a Technical Customer Support Call.

Brief:

- Laura Perez is having trouble completing bookings through the mobile app and has checked her internet connection and payment details.
- The issue is resolved by updating the Stay app on her iOS device.
- Laura agrees to receive articles on her email about the latest places and promos offered by Stay.

Detailed:

Key Topics:
- Technical issue with mobile app
- Updating software
- Sending promotional articles via email

Discussion:
- Laura Perez is unable to complete bookings through the mobile app despite refreshing the page and checking her internet connection and payment method
- The app keeps going back to the main page
- Miranda suggests updating the app as it was recently updated to remove bugs
- Laura has not updated the app yet and is using iOS 12.1
- Miranda instructs Laura to update the app through the app store
- Laura updates the app and is able to book a place
- Miranda offers to send promotional articles via email to Laura

Considerations

  • Summaries utilizes a large language model (LLM). This occasionally may lead to inaccuracies in the summaries produced and you may receive different summaries when running the same file multiple times
  • Potential transcription errors could surface in the summarized version. If you encounter any issues with the summary, we recommend reviewing the original transcript to check its accuracy

Error Responses

Unsupported Language

In the event that Summaries is requested for an unsupported language, the transcription process will complete. However, the summarization will not be performed, and an error message will be included in the final JSON output.

Summaries is supported for all of Speechmatics' languages except Irish, Maltese and Urdu.

{
  "job": { ... },
  "metadata": {
    "created_at": "2023-05-26T15:01:48.412714Z",
    "type": "transcription",
    "transcription_config": {
      "language": "xx"
    },
    "summarization_config": {},
    "summarization_errors": [
      {"type": "unsupported_language", "message": "Summarization not supported for xx."}
    ],
    ...
  },
  "results": [...]
}

Summaries Failed

In the event that Summaries fails, the transcription process will complete but the summarization will not be returned, and an error message will be included in the final JSON output. This can happen if the transcript is too short.

{
  "job": { ... },
  "metadata": {
    "created_at": "2023-05-26T15:01:48.412714Z",
    "type": "transcription",
    "transcription_config": {
      "language": "en"
    },
    "summarization_config": {},
    "summarization_errors": [
      {"type": "summarization_failed", "message": "Summarization failed."}
    ],
    ...
  },
  "results": [...]
}