Summaries

Transcription:BatchDeployments:SaaS

Speechmatics enables you to generate a concise summary from your audio. With just a single API call, you can quickly transcribe and summarize content, making content review simpler and more efficient.

If you're new to Speechmatics, please see our guide on Transcribing a File. Once you are set up, include the following config to enable Summaries:

{
  "type": "transcription",
  "transcription_config": {
    "language": "en"
  },
  "summarization_config": {}  # You can also configure the summary. See below for more detail.
}

Quick start

Python client example to summarize a file for Batch with the default parameters.

1from speechmatics.models import ConnectionSettings
2from speechmatics.batch_client import BatchClient
3from httpx import HTTPStatusError
4
5API_KEY = "YOUR_API_KEY"
6PATH_TO_FILE = "example.wav"
7LANGUAGE = "en" # Transcription language
8
9settings = ConnectionSettings(
10    url="https://asr.api.speechmatics.com/v2",
11    auth_token=API_KEY,
12)
13
14# Define transcription parameters
15conf = {
16    "type": "transcription",
17    "transcription_config": {
18        "language": LANGUAGE
19    },
20    "summarization_config": {}  # You can also configure the summary. See below for more detail.
21}
22
23# Open the client using a context manager
24with BatchClient(settings) as client:
25    try:
26        job_id = client.submit_job(
27            audio=PATH_TO_FILE,
28            transcription_config=conf,
29        )
30        print(f'job {job_id} submitted successfully, waiting for transcript')
31
32        # Note that in production, you should set up notifications instead of polling.
33        # Notifications are described here: https://docs.speechmatics.com/features-other/notifications
34        transcript = client.wait_for_completion(job_id, transcription_format='json-v2')
35        summary = transcript["summary"]["content"]
36        print(summary) # print the returned summary
37    except HTTPStatusError as e:
38        if e.response.status_code == 401:
39            print('Invalid API key - Check your API_KEY at the top of the code!')
40        elif e.response.status_code == 400:
41            print(e.response.json()['detail'])
42        else:
43            raise e
44

Example Response

The summary is only present in the JSON output.

{
  "job": { ... },
  "metadata": {
    "created_at": "2023-05-26T15:01:48.412714Z",
    "type": "transcription",
    "transcription_config": {
      "language": "en"
    },
    "summarization_config": {}
    ...
  },
  "results": [...],
  "summary": {
    "content": "Laura Perez called to seek assistance in completing her booking through the mobile application..."
  }
}

Supported Languages

Summaries is supported for all of Speechmatics' languages.

Configuration Options

Example with configuration parameters set:

{
  "type": "transcription",
  "transcription_config": {
    "language": "en"
  },
  "summarization_config": {
    "content_type": "informative",
    "summary_length": "detailed",
    "summary_type": "paragraphs"
  }
}

Use the below configuration options to adjust the format of the returned summary.

Configuration name	Description	Default Value
content_type	Choose from three options: conversational - Best suited for dialogues involving multiple participants, such as calls, meetings or discussions. It focuses on summarizing key points of the conversation. informative - Recommended for more structured information delivered by one or more people, making it ideal for videos, podcasts, lectures, and presentations. auto - Automatically selects the most appropriate content type based on an analysis of the transcript.	auto
summary_length	Determines the depth of the summary: brief - Provides a succinct summary, condensing the content into just a few sentences. detailed - Provide a longer, structured summary. For conversational content, it includes key topics and a summary of the entire conversation. For informative content, it logically divides the audio into sections and provides a summary for each.	brief
summary_type	Determines the formatting style of the summary: bullets - Presents the summary with bullets. paragraphs - Presents the summary with paragraphs.	bullets

Example Summary Outputs

Webinar - Informative

A summary of our Ursa release webinar. Watch on YouTube.

Brief:

Bullets
Paragraphs

- Speechmatics introduced new speech recognition models named Ursa that are optimized for GPUs and are the most accurate ASR on the planet with a 35% improvement for standard and a 22% improvement for enhanced. The new models are also three times more efficient than CPU-based models.
- The company has released a new version of its speech recognition software that includes improvements to speaker diarization, batch processing, and readability. The new version also includes automatic language identification and translation capabilities for 34 languages to and from English, with the translation feature outperforming Google's translation service.
- The speaker demonstrates how to use Speechmatics' transcription and translation services through their portal or API, including real-time transcription that can be customized with speaker labels and custom words. The speaker also answers questions about the accuracy of the system, maximum file size for transcription, and support for multiple languages in real-time streams.

Detailed:

Bullets
Paragraphs

Speechmatics' ASR Revolution
- Speechmatics introduced new models named Ursa that focus on accuracy, speed, and efficiency.
- The new models are optimized for GPUs, which are more powerful and efficient for processing.
- Ursa has been benchmarked against competitors and found to be the most accurate ASR on the planet, with a 35% improvement for standard and a 22% improvement for enhanced.
- The models perform better with accents and noise and are three times more efficient than CPU-based models.
- The latency for real-time transcription has also been improved, with the new models showing little difference in accuracy even at small latencies.
- Beta customers have used the new models for live captioning at a major car racing event.
Improvements to Speech Recognition Software
- Speechmatics released a new version of its speech recognition software that includes improvements to speaker diarization, batch processing, and readability.
- The software also includes automatic language identification and translation capabilities, with a single API call allowing translation to and from 34 languages.
- The company claims its transcription accuracy is the best available, and that its translation accuracy is ahead of Google's.
- The new version is available to enterprise customers and will be rolled out to other languages later this year.
Using Speechmatics' Services
- The speaker demonstrates how to use Speechmatics' transcription and translation services through their portal or API.
- The user can select the language they want to translate their audio file to and choose between enhanced or standard translation speed.
- The portal also allows users to view previous jobs and real-time transcription with the option to add custom words and reduce latency.
- The speaker also addresses questions about the maximum file size for transcription, the accuracy of the system, and the support for speaker labels and multiple languages in real-time streams.
- The speaker concludes by encouraging users to try out Speechmatics' services and visit their website for more information.

Introduction
Speechmatics, a speech recognition software company, held a live product update event titled "ASR Revolution" to introduce their new models that focus on accuracy, speed, and efficiency. The company has previously optimized their models for CPUs and recently launched a beta version for laptops. However, they have now shifted their focus to GPUs, which are more powerful and efficient for processing.
New Models
The new models, named Ursa, have been benchmarked against competitors and have been found to be the most accurate ASR on the planet, with a 35% improvement for standard and 22% for enhanced. The models also perform better with accents and noise. The company has also improved efficiency, with the new models being three times more efficient than CPU processing for standard and 1.7 times for enhanced. The new models also maintain accuracy even at small latencies, making them suitable for real-time transcription. The company has already had beta customers use the new models for live captioning at a major car racing event.
New Version
Speechmatics has released a new version of its speech recognition software that includes improvements to speaker diarization, batch processing, and readability. The new version also includes automatic language identification and translation capabilities for 34 languages to and from English. The translation feature is available in batch processing and will be available in real-time processing in Q2 2023. The company claims that its transcription accuracy is the best in the industry and that its translation feature outperforms Google's translation service. The new version is available to enterprise customers and will be rolled out to other languages later this year.
How to Use
The speaker demonstrates how to use Speechmatics' transcription and translation services through their portal or API. The user can select the language they want to translate their audio file to and choose between enhanced or standard speed. The portal also allows users to view previous jobs and real-time transcription with the option to add custom words and reduce latency. The speaker also addresses questions about the accuracy of the system, the maximum size of audio files, and the support for multiple languages in real-time streams.
Conclusion
The company claims to outcompete other alternatives in terms of accuracy, efficiency, and processing speed.

Interview - Informative

A summary of a LinkedIn interview live "Adding Value for Contact Center Solutions Using Speech to Text". Watch on YouTube.

Brief:

Bullets
Paragraphs

- Prosodica uses Speechmatics' transcription engine to provide insights and predictions based on customer conversations in contact center analytics.
- The accuracy of transcription is crucial for machine learning algorithms to make accurate predictions, and having a diverse and inclusive voice model is important to avoid disparate impacts.
- The future of speech recognition in the contact center space is evolving towards a hybrid workforce, where businesses need to be remote-ready to support both customers and employees.

Detailed:

Bullets
Paragraphs

Importance of accurate transcription in contact center analytics
- Speech-to-text technology used to provide insights and predictions based on customer conversations
- Accuracy of transcription crucial for machine learning algorithms to make accurate predictions
- Small improvements in transcription accuracy can have a significant impact on machine learning accuracy
Importance of diverse and inclusive voice model
- Multinational corporations send voice traffic to call centers all over the world
- More diverse set of voices used in training, wider the net for what's typical and same level of recognition accuracy can be achieved
- Non-native English speaking voices, female voices, and younger voices are all areas where many transcription engines struggle
Disparate impact of transcription accuracy differences
- If transcription is less accurate for a particular voice, then the prediction is less accurate for that voice, creating a disparate impact that businesses do not want
Future of speech recognition in contact center space
- Evolving towards a hybrid workforce, where businesses need to be remote-ready to support both customers and employees
- Automation is a trend, but there are certain interactions that people will not be comfortable dealing with an automated bot
- Sarcasm is a tricky problem, but speech and voice analytics used as data point to detect it.

Technical Support Call - Conversational

A summary of a Technical Customer Support Call.

Brief:

Bullets
Paragraphs

- Laura Perez is having trouble completing bookings through the mobile app and has checked her internet connection and payment details.
- The issue is resolved by updating the Stay app on her iOS device.
- Laura agrees to receive articles on her email about the latest places and promos offered by Stay.

Detailed:

Bullets
Paragraphs

Key Topics:
- Technical issue with mobile app
- Updating software
- Sending promotional articles via email

Discussion:
- Laura Perez is unable to complete bookings through the mobile app despite refreshing the page and checking her internet connection and payment method
- The app keeps going back to the main page
- Miranda suggests updating the app as it was recently updated to remove bugs
- Laura has not updated the app yet and is using iOS 12.1
- Miranda instructs Laura to update the app through the app store
- Laura updates the app and is able to book a place
- Miranda offers to send promotional articles via email to Laura

Considerations

Summaries utilizes a large language model (LLM). This occasionally may lead to inaccuracies in the summaries produced and you may receive different summaries when running the same file multiple times.
Potential transcription errors could surface in the summarized version. If you encounter any issues with the summary, we recommend reviewing the original transcript to check its accuracy.

Error Responses

Unsupported Language

In the event that Summaries is requested for an unsupported language, the transcription process will complete. However, the summarization will not be performed, and an error message will be included in the final JSON output.

{
  "job": { ... },
  "metadata": {
    "created_at": "2023-05-26T15:01:48.412714Z",
    "type": "transcription",
    "transcription_config": {
      "language": "cy"
    },
    "summarization_config": {},
    "summarization_errors": [
      {"type": "unsupported_language", "message": "Summarization not supported for cy."}
    ],
    ...
  },
  "results": [...]
}

Summaries Failed

In the event that Summaries fails, the transcription process will complete but the summarization will not be returned, and an error message will be included in the final JSON output. This can happen if the transcript is too short.

{
  "job": { ... },
  "metadata": {
    "created_at": "2023-05-26T15:01:48.412714Z",
    "type": "transcription",
    "transcription_config": {
      "language": "en"
    },
    "summarization_config": {},
    "summarization_errors": [
      {"type": "summarization_failed", "message": "Summarization failed."}
    ],
    ...
  },
  "results": [...]
}

Summaries

Quick start​

Example Response​

Supported Languages​

Configuration Options​

Example Summary Outputs​

Webinar - Informative​

Interview - Informative​

Technical Support Call - Conversational​

Considerations​

Error Responses​

Unsupported Language​

Summaries Failed​

Quick start

Example Response

Supported Languages

Configuration Options

Example Summary Outputs

Webinar - Informative

Interview - Informative

Technical Support Call - Conversational

Considerations

Error Responses

Unsupported Language

Summaries Failed