Skip to main content

Topics

Transcription:BatchDeployments:SaaS

Speechmatics enables you to detect topics from your audio. With just a single API call, you can quickly transcribe and identify key topics with the corresponding segments of audio, allowing you to tag conversations or identify trends and patterns.

If you're new to Speechmatics, please see our guide on Transcribing a File. Once you are set up, include the following config to enable Topics:

{
  "type": "transcription",
  "transcription_config": {
    "language": "en"
  },
  "topic_detection_config": {}  # You can also configure the list of topics you wish to detect. See below for more detail.
}

Quick start

Python client example to detect topics in a file for Batch with the default parameters.

1from speechmatics.models import ConnectionSettings
2from speechmatics.batch_client import BatchClient
3from httpx import HTTPStatusError
4
5API_KEY = "YOUR_API_KEY"
6PATH_TO_FILE = "example.wav"
7LANGUAGE = "en" # Transcription language
8
9settings = ConnectionSettings(
10    url="https://asr.api.speechmatics.com/v2",
11    auth_token=API_KEY,
12)
13
14# Define transcription parameters
15conf = {
16    "type": "transcription",
17    "transcription_config": {
18        "language": LANGUAGE
19    },
20    "topic_detection_config": {}  # You can also configure the list of topics you wish to detect. See below for more detail.
21}
22
23# Open the client using a context manager
24with BatchClient(settings) as client:
25    try:
26        job_id = client.submit_job(
27            audio=PATH_TO_FILE,
28            transcription_config=conf,
29        )
30        print(f'job {job_id} submitted successfully, waiting for transcript')
31
32        # Note that in production, you should set up notifications instead of polling.
33        # Notifications are described here: https://docs.speechmatics.com/features-other/notifications
34        transcript = client.wait_for_completion(job_id, transcription_format='json-v2')
35        topics_detected = transcript["topics"]
36        topic_segments = topics_detected["segments"]
37        topic_summary = topics_detected["summary"]["overall"]
38
39        # print the overall count for each topic
40        print(topic_summary)
41
42        # print the text and the corresponding topic(s) and timings for each segment
43        for segment in topic_segments:
44          print(f'({segment["start_time"]} - {segment["end_time"]}): {segment["text"]} ({[t["topic"] for t in segment["topics"]]})')
45    except HTTPStatusError as e:
46        if e.response.status_code == 401:
47            print('Invalid API key - Check your API_KEY at the top of the code!')
48        elif e.response.status_code == 400:
49            print(e.response.json()['detail'])
50        else:
51            raise e
52

Example Response

The topics detected are only present in the JSON output.

{
  "job": { ... },
  "metadata": {
    "created_at": "2023-05-26T15:01:48.412714Z",
    "type": "transcription",
    "transcription_config": {
      "language": "en"
    },
    "topic_detection_config": {}
    ...
  },
  "results": [...],
  "topics": {
    "segments": [
        {
          "text": "The National Park Service on Twitter says it expects the closures to remain in effect ...",
          "start_time": 0.80,
          "end_time": 1.72,
          "topics": [{"topic": "Events & Attractions"}],
        },
        {
          "text": "Lawmakers in Canada have voted to regulate online streaming content ...",
          "start_time": 2.06,
          "end_time": 3.40,
          "topics": [{"topic": News & Politics"}, {"topic": "Technology & Computing"}],
        }
    ],
    "summary": {
      "overall": {
        "Business & Finance": 1,
        "Education": 0,
        "Entertainment": 0,
        "Events & Attractions": 0,
        "Food & Drink": 0,
        "News & Politics": 1,
        "Science": 0,
        "Sports": 0,
        "Technology & Computing": 1,
        "Travel": 0
      }
    }
  }
}

Setting List of Topics

If you have a specific list of topics you wish to detect, you can provide this information through the topics parameter:

{
  "type": "transcription",
  "transcription_config": {
    "language": "en"
  },
  "topic_detection_config": {
    "topics": ["pricing", "deployment", "languages"]
  }
}

If you don't have a specific list of topics, the Topics feature will attempt to detect these default topics in the audio:

  • Business & Finance
  • Education
  • Entertainment
  • Events & Attractions
  • Food & Drink
  • News & Politics
  • Science
  • Sports
  • Technology & Computing
  • Travel

Note: A maximum of 10 topics can be provided. If more than 10 topics are provided in the topics list, the transcription will complete. However, no Topics will be returned, and an error message will be included in the final JSON output.

Example Topics Outputs

News Segment

Topics detected in a BBC News segment

Default Topics

Topic(s)SegmentStart Time (s)End Time (s)
News & PoliticsPresident Biden and the South Korean President Yoon Sung Yell have reaffirmed their commitment to extended deterrence in the face of the growing nuclear threat from North Korea. The two leaders have agreed to give South Korea a greater voice in how the US would respond to any nuclear incident.2.5721.35
News & PoliticsPresident Zelensky says he's discussed ways to reach a fair and sustainable peace. In his first direct phone call with the Chinese leader Xi Jinping, since Russia invaded Ukraine. Mr. Zelensky stressed Ukraine would not accept any loss of territory.21.9237.76
News & PoliticsRenewed fighting near Khartoum is threatening the three day ceasefire in Sudan. People in the capital are struggling to get food and water. Thousands are continuing to leave.38.3349.16
News & PoliticsThe UN's envoy to Haiti has told a meeting of the Security Council that gang violence is expanding at an alarming rate in areas previously considered safe.49.7360.02
Entertainment, News & PoliticsThe Walt Disney Corporation has sued the Republican governor of Florida, Ron DeSantis, seeking to overturn state efforts to exert control over its theme park. The entertainment giant says it's facing government retaliation after it criticized a Florida law that banned lessons and sexuality and gender identity in schools.61.182.13

Custom Topics (War, Weather)

Topic(s)SegmentStart Time (s)End Time (s)
WarPresident Biden and the South Korean President Yoon Sung Yell have reaffirmed their commitment to extended deterrence in the face of the growing nuclear threat from North Korea.2.5713.7
WarIn his first direct phone call with the Chinese leader Xi Jinping, since Russia invaded Ukraine. Mr. Zelensky stressed Ukraine would not accept any loss of territory.26.9937.76
WarRenewed fighting near Khartoum is threatening the three day ceasefire in Sudan.38.3343.25
WeatherThe European Union says in 2022, Southern Europe experienced the highest number of days on record with very strong heat stress.107.48117.56

Considerations

  • Topics is only supported for English.
  • Topics supports a custom list of up to 10 topics.
  • Topics utilizes a large language model (LLM). This occasionally may lead to inaccuracies in the topics detected and the corresponding segments of audio and you may receive different segments when running the same file multiple times.

Error Responses

Unsupported Language

Topics is currently only supported for English. In the event that Topics is requested for an unsupported language, the transcription process will complete. However, the Topics will not be performed, and an error message will be included in the final JSON output.

{
  "job": { ... },
  "metadata": {
    "created_at": "2023-05-26T15:01:48.412714Z",
    "type": "transcription",
    "transcription_config": {
      "language": "cy"
    },
    "topic_detection_config": {},
    "topic_detection_errors": [
      {"type": "unsupported_language", "message": "Topic Detection not supported for cy."}
    ],
    ...
  },
  "results": [...]
}

Too Many Topics

In the event that more than 10 topics are requested for Topics, the transcription process will complete. However, the Topics will not be returned, and an error message will be included in the final JSON output.

{
  "job": { ... },
  "metadata": {
    "created_at": "2023-05-26T15:01:48.412714Z",
    "type": "transcription",
    "transcription_config": {
      "language": "en"
    },
    "topic_detection_config": {
      "topics": ["topic1", "topic2", "topic3", "topic4", "topic5", "topic6", "topic7", "topic8", "topic9", "topic10", "topic11"]
    },
    "topic_detection_errors": [
      {"type": "unsupported_list_of_topics", "message": "List of topics cannot exceed 10 topics."}
    ],
    ...
  },
  "results": [...]
}

Topics Failed

In the event that Topics fails, the transcription process will complete but the Topics will not be returned, and an error message will be included in the final JSON output.

{
  "job": { ... },
  "metadata": {
    "created_at": "2023-05-26T15:01:48.412714Z",
    "type": "transcription",
    "transcription_config": {
      "language": "en"
    },
    "topic_detection_config": {},
    "topic_detection_errors": [
      {"type": "topic_detection_failed", "message": "Topic Detection failed."}
    ],
    ...
  },
  "results": [...]
}