Summaries
Transcription:BatchDeployments:SaaSSpeechmatics enables you to generate a concise summary from your audio. With just a single API call, you can quickly transcribe and summarize content, making content review simpler and more efficient.
If you're new to Speechmatics, please see our guide on Transcribing a File. Once you are set up, include the following config to enable Summaries:
{
"type": "transcription",
"transcription_config": {
"language": "en"
},
"summarization_config": {} # You can also configure the summary. See below for more detail.
}
Quick start
Python client example to summarize a file for Batch with the default parameters.
1from speechmatics.models import ConnectionSettings
2from speechmatics.batch_client import BatchClient
3from httpx import HTTPStatusError
4
5API_KEY = "YOUR_API_KEY"
6PATH_TO_FILE = "example.wav"
7LANGUAGE = "en" # Transcription language
8
9settings = ConnectionSettings(
10 url="https://asr.api.speechmatics.com/v2",
11 auth_token=API_KEY,
12)
13
14# Define transcription parameters
15conf = {
16 "type": "transcription",
17 "transcription_config": {
18 "language": LANGUAGE
19 },
20 "summarization_config": {} # You can also configure the summary. See below for more detail.
21}
22
23# Open the client using a context manager
24with BatchClient(settings) as client:
25 try:
26 job_id = client.submit_job(
27 audio=PATH_TO_FILE,
28 transcription_config=conf,
29 )
30 print(f'job {job_id} submitted successfully, waiting for transcript')
31
32 # Note that in production, you should set up notifications instead of polling.
33 # Notifications are described here: https://docs.speechmatics.com/features-other/notifications
34 transcript = client.wait_for_completion(job_id, transcription_format='json-v2')
35 summary = transcript["summary"]["content"]
36 print(summary) # print the returned summary
37 except HTTPStatusError as e:
38 if e.response.status_code == 401:
39 print('Invalid API key - Check your API_KEY at the top of the code!')
40 elif e.response.status_code == 400:
41 print(e.response.json()['detail'])
42 else:
43 raise e
44
Example Response
The summary is only present in the JSON output.
{
"job": { ... },
"metadata": {
"created_at": "2023-05-26T15:01:48.412714Z",
"type": "transcription",
"transcription_config": {
"language": "en"
},
"summarization_config": {}
...
},
"results": [...],
"summary": {
"content": "Laura Perez called to seek assistance in completing her booking through the mobile application..."
}
}
Configuration Options
Example with configuration parameters set:
{
"type": "transcription",
"transcription_config": {
"language": "en"
},
"summarization_config": {
"content_type": "informative",
"summary_length": "detailed",
"summary_type": "paragraphs"
}
}
Use the below configuration options to adjust the format of the returned summary.
Configuration name | Description | Default Value |
---|---|---|
content_type | Choose from three options: conversational - Best suited for dialogues involving multiple participants, such as calls, meetings or discussions. It focuses on summarizing key points of the conversation. informative - Recommended for more structured information delivered by one or more people, making it ideal for videos, podcasts, lectures, and presentations. auto - Automatically selects the most appropriate content type based on an analysis of the transcript. | auto |
summary_length | Determines the depth of the summary: brief - Provides a succinct summary, condensing the content into just a few sentences. detailed - Provide a longer, structured summary. For conversational content, it includes key topics and a summary of the entire conversation. For informative content, it logically divides the audio into sections and provides a summary for each. | brief |
summary_type | Determines the formatting style of the summary: bullets - Presents the summary with bullets. paragraphs - Presents the summary with paragraphs. | bullets |
Example Summary Outputs
Webinar - Informative
A summary of our Ursa release webinar. Watch on YouTube.
Brief:
- Bullets
- Paragraphs
- Speechmatics introduced new speech recognition models named Ursa that are optimized for GPUs and are the most accurate ASR on the planet with a 35% improvement for standard and a 22% improvement for enhanced. The new models are also three times more efficient than CPU-based models.
- The company has released a new version of its speech recognition software that includes improvements to speaker diarization, batch processing, and readability. The new version also includes automatic language identification and translation capabilities for 34 languages to and from English, with the translation feature outperforming Google's translation service.
- The speaker demonstrates how to use Speechmatics' transcription and translation services through their portal or API, including real-time transcription that can be customized with speaker labels and custom words. The speaker also answers questions about the accuracy of the system, maximum file size for transcription, and support for multiple languages in real-time streams.
Speechmatics has introduced new speech recognition software models named Ursa, optimized for GPUs, which are more powerful and efficient for processing. The new models have been benchmarked against competitors and found to be the most accurate ASR on the planet, with a 35% improvement for standard and 22% for enhanced. The company has also released a new version of its software that includes improvements to speaker diarization, batch processing, and readability, as well as automatic language identification and translation capabilities for 34 languages to and from English. The company claims to outcompete other alternatives in terms of accuracy, efficiency, and processing speed.
Detailed:
- Bullets
- Paragraphs
Speechmatics' ASR Revolution
- Speechmatics introduced new models named Ursa that focus on accuracy, speed, and efficiency.
- The new models are optimized for GPUs, which are more powerful and efficient for processing.
- Ursa has been benchmarked against competitors and found to be the most accurate ASR on the planet, with a 35% improvement for standard and a 22% improvement for enhanced.
- The models perform better with accents and noise and are three times more efficient than CPU-based models.
- The latency for real-time transcription has also been improved, with the new models showing little difference in accuracy even at small latencies.
- Beta customers have used the new models for live captioning at a major car racing event.Improvements to Speech Recognition Software
- Speechmatics released a new version of its speech recognition software that includes improvements to speaker diarization, batch processing, and readability.
- The software also includes automatic language identification and translation capabilities, with a single API call allowing translation to and from 34 languages.
- The company claims its transcription accuracy is the best available, and that its translation accuracy is ahead of Google's.
- The new version is available to enterprise customers and will be rolled out to other languages later this year.Using Speechmatics' Services
- The speaker demonstrates how to use Speechmatics' transcription and translation services through their portal or API.
- The user can select the language they want to translate their audio file to and choose between enhanced or standard translation speed.
- The portal also allows users to view previous jobs and real-time transcription with the option to add custom words and reduce latency.
- The speaker also addresses questions about the maximum file size for transcription, the accuracy of the system, and the support for speaker labels and multiple languages in real-time streams.
- The speaker concludes by encouraging users to try out Speechmatics' services and visit their website for more information.
Introduction
Speechmatics, a speech recognition software company, held a live product update event titled "ASR Revolution" to introduce their new models that focus on accuracy, speed, and efficiency. The company has previously optimized their models for CPUs and recently launched a beta version for laptops. However, they have now shifted their focus to GPUs, which are more powerful and efficient for processing.New Models
The new models, named Ursa, have been benchmarked against competitors and have been found to be the most accurate ASR on the planet, with a 35% improvement for standard and 22% for enhanced. The models also perform better with accents and noise. The company has also improved efficiency, with the new models being three times more efficient than CPU processing for standard and 1.7 times for enhanced. The new models also maintain accuracy even at small latencies, making them suitable for real-time transcription. The company has already had beta customers use the new models for live captioning at a major car racing event.New Version
Speechmatics has released a new version of its speech recognition software that includes improvements to speaker diarization, batch processing, and readability. The new version also includes automatic language identification and translation capabilities for 34 languages to and from English. The translation feature is available in batch processing and will be available in real-time processing in Q2 2023. The company claims that its transcription accuracy is the best in the industry and that its translation feature outperforms Google's translation service. The new version is available to enterprise customers and will be rolled out to other languages later this year.How to Use
The speaker demonstrates how to use Speechmatics' transcription and translation services through their portal or API. The user can select the language they want to translate their audio file to and choose between enhanced or standard speed. The portal also allows users to view previous jobs and real-time transcription with the option to add custom words and reduce latency. The speaker also addresses questions about the accuracy of the system, the maximum size of audio files, and the support for multiple languages in real-time streams.Conclusion
The company claims to outcompete other alternatives in terms of accuracy, efficiency, and processing speed.
Interview - Informative
A summary of a LinkedIn interview live "Adding Value for Contact Center Solutions Using Speech to Text". Watch on YouTube.
Brief:
- Bullets
- Paragraphs
- Prosodica uses Speechmatics' transcription engine to provide insights and predictions based on customer conversations in contact center analytics.
- The accuracy of transcription is crucial for machine learning algorithms to make accurate predictions, and having a diverse and inclusive voice model is important to avoid disparate impacts.
- The future of speech recognition in the contact center space is evolving towards a hybrid workforce, where businesses need to be remote-ready to support both customers and employees.
The use of speech-to-text technology in contact center analytics is discussed in a conversation between Ricardo Herreros-Symons and Mariano Tan. They highlight the importance of transcription accuracy for machine learning algorithms to make accurate predictions and the need for a diverse and inclusive voice model. The future of speech recognition in the contact center space is expected to involve a hybrid workforce, with both automation and improving people's lives being in demand. The use of prosodic data points can help detect sarcasm, which is a tricky problem for machine learning algorithms.
Detailed:
- Bullets
- Paragraphs
Importance of accurate transcription in contact center analytics
- Speech-to-text technology used to provide insights and predictions based on customer conversations
- Accuracy of transcription crucial for machine learning algorithms to make accurate predictions
- Small improvements in transcription accuracy can have a significant impact on machine learning accuracyImportance of diverse and inclusive voice model
- Multinational corporations send voice traffic to call centers all over the world
- More diverse set of voices used in training, wider the net for what's typical and same level of recognition accuracy can be achieved
- Non-native English speaking voices, female voices, and younger voices are all areas where many transcription engines struggleDisparate impact of transcription accuracy differences
- If transcription is less accurate for a particular voice, then the prediction is less accurate for that voice, creating a disparate impact that businesses do not wantFuture of speech recognition in contact center space
- Evolving towards a hybrid workforce, where businesses need to be remote-ready to support both customers and employees
- Automation is a trend, but there are certain interactions that people will not be comfortable dealing with an automated bot
- Sarcasm is a tricky problem, but speech and voice analytics used as data point to detect it.
Introduction
In this transcript, Ricardo Herreros-Symons, VP of Sales Business Development at Speechmatics, and Mariano Tan, CEO and founder of Prosodica, discuss the use of speech-to-text technology in contact center analytics. Prosodica uses Speechmatics' transcription engine to provide insights and predictions based on customer conversations.Importance of Transcription Accuracy
The accuracy of the transcription is crucial for machine learning algorithms to make accurate predictions, and even small improvements in transcription accuracy can have a significant impact on machine learning accuracy. The importance of having a diverse and inclusive voice model has also risen in recent years, as multinational corporations send voice traffic to call centers all over the world.Disparate Impacts of Transcription Accuracy Differences
As speech recognition becomes more mainstream, businesses are becoming increasingly concerned about the potential disparate impacts of transcription accuracy differences when analyzing less typical voices. The more diverse the set of voices used in training, the more likely we are to have a wider net for what's typical and get the same level of recognition accuracy across that.Future of Speech and Speech Recognition in Contact Center Space
In terms of the future of speech and speech recognition within the context center space, there is more prevalent use of speech analytics technology in the contact center space. The two approaches are to use speech recognition to fully automate interactions or to measure and improve the experience, the employee experience. The technology was easier to implement in a cloud-based format, and so we did see a lot of adoption for cloud providers. However, there are certain interactions that people will not get comfortable with dealing with an automated bot.Detection of Sarcasm
Sarcasm is a tricky problem, but the reason why prosodica uses as its data point, both speech and voice analytics, is to detect that.
Technical Support Call - Conversational
A summary of a Technical Customer Support Call.
Brief:
- Bullets
- Paragraphs
- Laura Perez is having trouble completing bookings through the mobile app and has checked her internet connection and payment details.
- The issue is resolved by updating the Stay app on her iOS device.
- Laura agrees to receive articles on her email about the latest places and promos offered by Stay.
Laura Perez contacts Miranda for help with completing her bookings through the mobile application. Despite refreshing the page and checking her internet connection and payment method, she is unable to complete the booking. Miranda suggests updating the Stay app on her phone, which Laura does, and is then able to successfully book a place. Miranda also offers to send Laura articles on the latest places and promos that Stay offers. Laura expresses gratitude and ends the call.
Detailed:
- Bullets
- Paragraphs
Key Topics:
- Technical issue with mobile app
- Updating software
- Sending promotional articles via email
Discussion:
- Laura Perez is unable to complete bookings through the mobile app despite refreshing the page and checking her internet connection and payment method
- The app keeps going back to the main page
- Miranda suggests updating the app as it was recently updated to remove bugs
- Laura has not updated the app yet and is using iOS 12.1
- Miranda instructs Laura to update the app through the app store
- Laura updates the app and is able to book a place
- Miranda offers to send promotional articles via email to Laura
Key Topics:
- Technical issue with mobile application
- Updating software
- Sending promotional articles via email
Discussion:
Laura Perez called the customer service of Stay to report that she was unable to complete her bookings through the mobile application. She had already refreshed the application several times, checked her internet connection, and payment method, but still encountered the same issue. She also mentioned that the application goes back to the main page every time she tries to browse. The customer service representative, Miranda, asked if the application showed any error message, but Laura said it did not. Miranda then asked if Laura had updated the software to its latest version, to which Laura replied that she was not aware of any update. Miranda advised Laura to update the Stay app on her phone by going to the app store and clicking the update button beside the Stay icon. After updating the app, Laura was able to book a place successfully.
Miranda also asked Laura if it was okay to send her articles on her email to keep her updated on the latest places and promos that Stay offers. Laura agreed and thanked Miranda for her help.
Considerations
- Summaries utilizes a large language model (LLM). This occasionally may lead to inaccuracies in the summaries produced and you may receive different summaries when running the same file multiple times
- Potential transcription errors could surface in the summarized version. If you encounter any issues with the summary, we recommend reviewing the original transcript to check its accuracy
Error Responses
Unsupported Language
In the event that Summaries is requested for an unsupported language, the transcription process will complete. However, the summarization will not be performed, and an error message will be included in the final JSON output.
Summaries is supported for all of Speechmatics' languages except Irish, Maltese, Urdu, Bengali and Swahili.
{
"job": { ... },
"metadata": {
"created_at": "2023-05-26T15:01:48.412714Z",
"type": "transcription",
"transcription_config": {
"language": "xx"
},
"summarization_config": {},
"summarization_errors": [
{"type": "unsupported_language", "message": "Summarization not supported for xx."}
],
...
},
"results": [...]
}
Summaries Failed
In the event that Summaries fails, the transcription process will complete but the summarization will not be returned, and an error message will be included in the final JSON output. This can happen if the transcript is too short.
{
"job": { ... },
"metadata": {
"created_at": "2023-05-26T15:01:48.412714Z",
"type": "transcription",
"transcription_config": {
"language": "en"
},
"summarization_config": {},
"summarization_errors": [
{"type": "summarization_failed", "message": "Summarization failed."}
],
...
},
"results": [...]
}