Skip to main content

Using FFMPEG

Transcription:Real-Time Deployments:All

One of the most powerful use cases for ASR is real-time transcription of microphone inputs. However, working with audio devices can be hard. FFMPEG is the most popular tool for processing a wide range of media streams, including audio and video. In this tutorial, we'll walk you through installing and running ffmpeg as an input source for the Speechmatics CLI tool.

Install the Speechmatics CLI

Before we move onto using FFMPEG, you'll need to install the speechmatics CLI in your local environment. In order to procede with this installation, you will need Python version >= 3.7. Run the command:

pip install speechmatics-python

Once this command is complete, check the installation was a success by running:

speechmatics -h

Install and Run FFMPEG

Select your OS type from the tabs below and follow the provided guide to run FFMPEG with Speechmatics on your machine.

You will need to be running a version of linux compatible with the apt installation tool to follow this guide e.g. Ubuntu 22, Debian 11 etc. Alternatively, you can download it from source by following the links here.

Install FFMPEG

Before you start, you'll need a valid install of FFMPEG. You can get this through the following steps.

  1. Update apt:
sudo apt update
  1. Install the FFMPEG command and alsa-utils
sudo apt install ffmpeg libasound2 alsa-utils
  1. Check the installation
ffmpeg -version

Run the CLI commands

Set up your local CLI environment with the following config command:

speechmatics config set --auth-token {YOUR_API_KEY} --generate-temp-token

Now run speechmatics CLI with FFMPEG microphone input piped in:

ffmpeg -loglevel quiet -f alsa -i hw:0 -f wav - | \
    speechmatics transcribe --max-delay 2 -

The above example uses a default device index of zero. You can also choose a specific audio device by running:

arecord -l

To use the sound device of your choice, in the example above, replace hw:0 with hw:{CARD_NO},{DEVICE_NO},{OPTIONAL:SUB_DEVICE_NO} e.g.:

ffmpeg -loglevel quiet -f alsa -i hw:{CARD_NO},{DEVICE_NO},{OPTIONAL:SUB_DEVICE_NO} -f wav - | \
    speechmatics transcribe --max-delay 2 -