Topics
Transcription:BatchDeployments:SaaSSpeechmatics enables you to detect topics from your audio. With just a single API call, you can quickly transcribe and identify key topics with the corresponding segments of audio, allowing you to tag conversations or identify trends and patterns.
If you're new to Speechmatics, please see our guide on Transcribing a File. Once you are set up, include the following config to enable Topics:
{
"type": "transcription",
"transcription_config": {
"language": "en"
},
"topic_detection_config": {} # You can also configure the list of topics you wish to detect. See below for more detail.
}
Quick start
Python client example to detect topics in a file for Batch with the default parameters.
1from speechmatics.models import ConnectionSettings
2from speechmatics.batch_client import BatchClient
3from httpx import HTTPStatusError
4
5API_KEY = "YOUR_API_KEY"
6PATH_TO_FILE = "example.wav"
7LANGUAGE = "en" # Transcription language
8
9settings = ConnectionSettings(
10 url="https://asr.api.speechmatics.com/v2",
11 auth_token=API_KEY,
12)
13
14# Define transcription parameters
15conf = {
16 "type": "transcription",
17 "transcription_config": {
18 "language": LANGUAGE
19 },
20 "topic_detection_config": {} # You can also configure the list of topics you wish to detect. See below for more detail.
21}
22
23# Open the client using a context manager
24with BatchClient(settings) as client:
25 try:
26 job_id = client.submit_job(
27 audio=PATH_TO_FILE,
28 transcription_config=conf,
29 )
30 print(f'job {job_id} submitted successfully, waiting for transcript')
31
32 # Note that in production, you should set up notifications instead of polling.
33 # Notifications are described here: https://docs.speechmatics.com/features-other/notifications
34 transcript = client.wait_for_completion(job_id, transcription_format='json-v2')
35 topics_detected = transcript["topics"]
36 topic_segments = topics_detected["segments"]
37 topic_summary = topics_detected["summary"]["overall"]
38
39 # print the overall count for each topic
40 print(topic_summary)
41
42 # print the text and the corresponding topic(s) and timings for each segment
43 for segment in topic_segments:
44 print(f'({segment["start_time"]} - {segment["end_time"]}): {segment["text"]} ({[t["topic"] for t in segment["topics"]]})')
45 except HTTPStatusError as e:
46 if e.response.status_code == 401:
47 print('Invalid API key - Check your API_KEY at the top of the code!')
48 elif e.response.status_code == 400:
49 print(e.response.json()['detail'])
50 else:
51 raise e
52
Example Response
The topics detected are only present in the JSON output.
{
"job": { ... },
"metadata": {
"created_at": "2023-05-26T15:01:48.412714Z",
"type": "transcription",
"transcription_config": {
"language": "en"
},
"topic_detection_config": {}
...
},
"results": [...],
"topics": {
"segments": [
{
"text": "The National Park Service on Twitter says it expects the closures to remain in effect ...",
"start_time": 0.80,
"end_time": 1.72,
"topics": [{"topic": "Events & Attractions"}],
},
{
"text": "Lawmakers in Canada have voted to regulate online streaming content ...",
"start_time": 2.06,
"end_time": 3.40,
"topics": [{"topic": News & Politics"}, {"topic": "Technology & Computing"}],
}
],
"summary": {
"overall": {
"Business & Finance": 1,
"Education": 0,
"Entertainment": 0,
"Events & Attractions": 0,
"Food & Drink": 0,
"News & Politics": 1,
"Science": 0,
"Sports": 0,
"Technology & Computing": 1,
"Travel": 0
}
}
}
}
Setting List of Topics
If you have a specific list of topics you wish to detect, you can provide this information through the topics
parameter:
{
"type": "transcription",
"transcription_config": {
"language": "en"
},
"topic_detection_config": {
"topics": ["pricing", "deployment", "languages"]
}
}
If you don't have a specific list of topics, the Topics feature will attempt to detect these default topics in the audio:
- Business & Finance
- Education
- Entertainment
- Events & Attractions
- Food & Drink
- News & Politics
- Science
- Sports
- Technology & Computing
- Travel
A maximum of 10 topics can be provided. If more than 10 topics are provided in the topics
list, the transcription will complete. However, no Topics will be returned, and an error message will be included in the final JSON output.
Example Topics Outputs
News Segment
Topics detected in a BBC News segment
Default Topics
Topic(s) | Segment | Start Time (s) | End Time (s) |
---|---|---|---|
News & Politics | President Biden and the South Korean President Yoon Sung Yell have reaffirmed their commitment to extended deterrence in the face of the growing nuclear threat from North Korea. The two leaders have agreed to give South Korea a greater voice in how the US would respond to any nuclear incident. | 2.57 | 21.35 |
News & Politics | President Zelensky says he's discussed ways to reach a fair and sustainable peace. In his first direct phone call with the Chinese leader Xi Jinping, since Russia invaded Ukraine. Mr. Zelensky stressed Ukraine would not accept any loss of territory. | 21.92 | 37.76 |
News & Politics | Renewed fighting near Khartoum is threatening the three day ceasefire in Sudan. People in the capital are struggling to get food and water. Thousands are continuing to leave. | 38.33 | 49.16 |
News & Politics | The UN's envoy to Haiti has told a meeting of the Security Council that gang violence is expanding at an alarming rate in areas previously considered safe. | 49.73 | 60.02 |
Entertainment, News & Politics | The Walt Disney Corporation has sued the Republican governor of Florida, Ron DeSantis, seeking to overturn state efforts to exert control over its theme park. The entertainment giant says it's facing government retaliation after it criticized a Florida law that banned lessons and sexuality and gender identity in schools. | 61.1 | 82.13 |
Custom Topics (War, Weather)
Topic(s) | Segment | Start Time (s) | End Time (s) |
---|---|---|---|
War | President Biden and the South Korean President Yoon Sung Yell have reaffirmed their commitment to extended deterrence in the face of the growing nuclear threat from North Korea. | 2.57 | 13.7 |
War | In his first direct phone call with the Chinese leader Xi Jinping, since Russia invaded Ukraine. Mr. Zelensky stressed Ukraine would not accept any loss of territory. | 26.99 | 37.76 |
War | Renewed fighting near Khartoum is threatening the three day ceasefire in Sudan. | 38.33 | 43.25 |
Weather | The European Union says in 2022, Southern Europe experienced the highest number of days on record with very strong heat stress. | 107.48 | 117.56 |
Considerations
- Topics is only supported for English
- Topics supports a custom list of up to 10 topics
- Topics utilizes a large language model (LLM). This occasionally may lead to inaccuracies in the topics detected and the corresponding segments of audio and you may receive different segments when running the same file multiple times
Error Responses
Unsupported Language
Topics is currently only supported for English. In the event that Topics is requested for an unsupported language, the transcription process will complete. However, the Topics will not be performed, and an error message will be included in the final JSON output.
{
"job": { ... },
"metadata": {
"created_at": "2023-05-26T15:01:48.412714Z",
"type": "transcription",
"transcription_config": {
"language": "cy"
},
"topic_detection_config": {},
"topic_detection_errors": [
{"type": "unsupported_language", "message": "Topic Detection not supported for cy."}
],
...
},
"results": [...]
}
Too Many Topics
In the event that more than 10 topics are requested for Topics, the transcription process will complete. However, the Topics will not be returned, and an error message will be included in the final JSON output.
{
"job": { ... },
"metadata": {
"created_at": "2023-05-26T15:01:48.412714Z",
"type": "transcription",
"transcription_config": {
"language": "en"
},
"topic_detection_config": {
"topics": ["topic1", "topic2", "topic3", "topic4", "topic5", "topic6", "topic7", "topic8", "topic9", "topic10", "topic11"]
},
"topic_detection_errors": [
{"type": "unsupported_list_of_topics", "message": "List of topics cannot exceed 10 topics."}
],
...
},
"results": [...]
}
Topics Failed
In the event that Topics fails, the transcription process will complete but the Topics will not be returned, and an error message will be included in the final JSON output.
{
"job": { ... },
"metadata": {
"created_at": "2023-05-26T15:01:48.412714Z",
"type": "transcription",
"transcription_config": {
"language": "en"
},
"topic_detection_config": {},
"topic_detection_errors": [
{"type": "topic_detection_failed", "message": "Topic Detection failed."}
],
...
},
"results": [...]
}