Topics
Learn how to use Speechmatics' Topics.Speechmatics enables you to detect topics from your audio. With just a single API call, you can quickly transcribe and identify key topics with the corresponding segments of audio, allowing you to tag conversations or identify trends and patterns.
If you're new to Speechmatics, please see our guide on Transcribing a File. Once you are set up, include the following config to enable Topics:
{
"type": "transcription",
"transcription_config": {
"language": "en"
},
// You can also configure the list of topics you wish to detect. See below for more detail.
"topic_detection_config": {}
}
Example
Python client example to detect topics in a file for Batch with the default parameters.
from speechmatics.models import ConnectionSettings
from speechmatics.batch_client import BatchClient
from httpx import HTTPStatusError
API_KEY = "YOUR_API_KEY"
PATH_TO_FILE = "example.wav"
LANGUAGE = "en" # Transcription language
settings = ConnectionSettings(
url="https://asr.api.speechmatics.com/v2",
auth_token=API_KEY,
)
# Define transcription parameters
conf = {
"type": "transcription",
"transcription_config": {"language": LANGUAGE},
# You can also configure the list of topics you wish to detect. See below for more detail.
"topic_detection_config": {},
}
# Open the client using a context manager
with BatchClient(settings) as client:
try:
job_id = client.submit_job(
audio=PATH_TO_FILE,
transcription_config=conf,
)
print(f"job {job_id} submitted successfully, waiting for transcript")
# Note that in production, you should set up notifications instead of polling.
# Notifications are described here: https://docs.speechmatics.com/batch-transcription/notifications
transcript = client.wait_for_completion(job_id, transcription_format="json-v2")
topics_detected = transcript["topics"]
topic_segments = topics_detected["segments"]
topic_summary = topics_detected["summary"]["overall"]
# print the overall count for each topic
print(topic_summary)
# print the text and the corresponding topic(s) and timings for each segment
for segment in topic_segments:
print(
f"({segment['start_time']} - {segment['end_time']}): {segment['text']} ({[t['topic'] for t in segment['topics']]})"
)
except HTTPStatusError as e:
if e.response.status_code == 401:
print("Invalid API key - Check your API_KEY at the top of the code!")
elif e.response.status_code == 400:
print(e.response.json()["detail"])
else:
raise e
Example Response
The topics detected are only present in the JSON output.
{
"job": { ... },
"metadata": {
"created_at": "2023-05-26T15:01:48.412714Z",
"type": "transcription",
"transcription_config": {
"language": "en"
},
"topic_detection_config": {}
...
},
"results": [...],
"topics": {
"segments": [
{
"text": "The National Park Service on Twitter says it expects the closures to remain in effect ...",
"start_time": 0.80,
"end_time": 1.72,
"topics": [{"topic": "Events & Attractions"}],
},
{
"text": "Lawmakers in Canada have voted to regulate online streaming content ...",
"start_time": 2.06,
"end_time": 3.40,
"topics": [{"topic": "News & Politics"}, {"topic": "Technology & Computing"}],
}
],
"summary": {
"overall": {
"Business & Finance": 1,
"Education": 0,
"Entertainment": 0,
"Events & Attractions": 0,
"Food & Drink": 0,
"News & Politics": 1,
"Science": 0,
"Sports": 0,
"Technology & Computing": 1,
"Travel": 0
}
}
}
}
Setting List of Topics
If you have a specific list of topics you wish to detect, you can provide this information through the topics
parameter:
{
"type": "transcription",
"transcription_config": {
"language": "en"
},
"topic_detection_config": {
"topics": ["pricing", "deployment", "languages"]
}
}
If you don't have a specific list of topics, the Topics feature will attempt to detect these default topics in the audio:
- Business & Finance
- Education
- Entertainment
- Events & Attractions
- Food & Drink
- News & Politics
- Science
- Sports
- Technology & Computing
- Travel
A maximum of 10 topics can be provided. If more than 10 topics are provided in the topics
list, the transcription will complete. However, no Topics will be returned, and an error message will be included in the final JSON output.
Example Topics Outputs
News Segment
Topics detected in a BBC News segment
Default Topics
Custom Topics (War, Weather)
Considerations
- Topics is only supported for English
- Topics supports a custom list of up to 10 topics
- Topics utilizes a large language model (LLM). This occasionally may lead to inaccuracies in the topics detected and the corresponding segments of audio and you may receive different segments when running the same file multiple times
Error Responses
Unsupported Language
Topics is currently only supported for English. In the event that Topics is requested for an unsupported language, the transcription process will complete. However, the Topics will not be performed, and an error message will be included in the final JSON output.
{
"job": { ... },
"metadata": {
"created_at": "2023-05-26T15:01:48.412714Z",
"type": "transcription",
"transcription_config": {
"language": "cy"
},
"topic_detection_config": {},
"topic_detection_errors": [
{"type": "unsupported_language", "message": "Topic Detection not supported for cy."}
],
...
},
"results": [...]
}
Too Many Topics
In the event that more than 10 topics are requested for Topics, the transcription process will complete. However, the Topics will not be returned, and an error message will be included in the final JSON output.
{
"job": { ... },
"metadata": {
"created_at": "2023-05-26T15:01:48.412714Z",
"type": "transcription",
"transcription_config": {
"language": "en"
},
"topic_detection_config": {
"topics": ["topic1", "topic2", "topic3", "topic4", "topic5", "topic6", "topic7", "topic8", "topic9", "topic10", "topic11"]
},
"topic_detection_errors": [
{"type": "unsupported_list_of_topics", "message": "List of topics cannot exceed 10 topics."}
],
...
},
"results": [...]
}
Topics Failed
In the event that Topics fails, the transcription process will complete but the Topics will not be returned, and an error message will be included in the final JSON output.
{
"job": { ... },
"metadata": {
"created_at": "2023-05-26T15:01:48.412714Z",
"type": "transcription",
"transcription_config": {
"language": "en"
},
"topic_detection_config": {},
"topic_detection_errors": [
{"type": "topic_detection_failed", "message": "Topic Detection failed."}
],
...
},
"results": [...]
}