Speech to TextRealtime TranscriptionGuides

Python - Using FFMPEG

Use ffmpeg to pipe microphone input into the Speechmatics Real-Time API

One of the most powerful use cases for ASR is real-time transcription of microphone inputs. However, working with audio devices can be hard. FFMPEG is the most popular tool for processing a wide range of media streams, including audio and video. In this tutorial, we'll walk you through installing and running ffmpeg as an input source for the Speechmatics CLI tool.

Install the Speechmatics CLI

Before we move onto using FFMPEG, you'll need to install the speechmatics CLI in your local environment. In order to proceed with this installation, you will need Python version >= 3.7. Run the command:

pip install speechmatics-python

Once this command is complete, check the installation was a success by running:

speechmatics -h

Install and Run FFMPEG

Select your OS type from the tabs below and follow the provided guide to run FFMPEG with Speechmatics on your machine.

You will need to be running a version of linux compatible with the apt installation tool to follow this guide e.g. Ubuntu 22, Debian 11 etc. Alternatively, you can download it from source by following the links here.

Install FFMPEG

Before you start, you'll need a valid install of FFMPEG. You can get this through the following steps.

Update apt:

sudo apt update

Install the FFMPEG command and alsa-utils

sudo apt install ffmpeg libasound2 alsa-utils

Check the installation

ffmpeg -version

Run the CLI Commands

Set up your local CLI environment with the following config command:

speechmatics config set --auth-token {YOUR_API_KEY} --generate-temp-token

Now run speechmatics CLI with FFMPEG microphone input piped in:

ffmpeg -loglevel quiet -f alsa -i hw:0 -f wav - | \
    speechmatics transcribe --max-delay 2 -

The above example uses a default device index of zero. You can also choose a specific audio device by running:

arecord -l

To use the sound device of your choice, in the example above, replace hw:0 with hw:{CARD_NO},{DEVICE_NO},{OPTIONAL:SUB_DEVICE_NO} e.g.:

ffmpeg -loglevel quiet -f alsa -i hw:{CARD_NO},{DEVICE_NO},{OPTIONAL:SUB_DEVICE_NO} -f wav - | \
    speechmatics transcribe --max-delay 2 -

Install FFMPEG

Assuming you have Homebrew installed, run:

Update all your Homebrew formulae:

brew update

Upgrade your outdated formulae:

brew upgrade

Install FFMPEG from Homebrew:

brew install ffmpeg

Run the CLI Commands

Set up your local CLI environment with the following config command:

speechmatics config set --auth-token {YOUR_API_KEY} --generate-temp-token

Now run Speechmatics CLI with FFMPEG microphone input piped in:

ffmpeg -loglevel quiet -f avfoundation -i ":default" -f wav - | \
    speechmatics transcribe --max-delay 2 -

The above example uses the default device index, which may not always work. To view all devices, run:

ffmpeg -f avfoundation -list_devices true -i ""

Then to select a different device replace :default in the example above with your device index of choice.

Install FFMPEG

Download the zip file from a recommended source. You can find available sources on the FFMPEG site here.
Once you have successfully downloaded one of these zip files, extract it into a folder. You'll need a file archiver to unzip the file e.g. 7-zip.
Rename the folder that is created by unzipping the archive to ffmpeg and move it to a location on you computer where you are happy for it to remain e.g. C:\ffmpeg
Add FFMPEG to your file path. Type system variables into the search bar and click Edit the system environment variables. Under User variables, select Path and click Edit. Click New, and add the bin folder within your FFMPEG folder to the path e.g. C:\ffmpeg\bin. Click OK.
Open a new CMD or PWD window and verify the installation with the command:

ffmpeg -version

You should see some output with the version, copyright and build information.

Run the CLI Commands

Set up your local CLI environment with the following config command:

speechmatics config set --auth-token {YOUR_API_KEY} --generate-temp-token

List the audio devices available on your Windows machine:

ffmpeg -list_devices true -f dshow -i dummy -

Now run Speechmatics CLI with FFMPEG microphone input piped in. Form the list of devices pick a device and input the id in the command below:

ffmpeg -loglevel quiet -f dshow -i audio="<audio device>" -f wav - | \
    speechmatics transcribe --max-delay 2 -

Install the Speechmatics CLI​

Install and Run FFMPEG​

Install FFMPEG​

Run the CLI Commands​

Install FFMPEG​

Run the CLI Commands​

Install FFMPEG​

Run the CLI Commands​

Install the Speechmatics CLI

Install and Run FFMPEG

Install FFMPEG

Run the CLI Commands

Install FFMPEG

Run the CLI Commands

Install FFMPEG

Run the CLI Commands