Using FFMPEG
Transcription:Real-TimeDeployments:AllOne of the most powerful use cases for ASR is real-time transcription of microphone inputs. However, working with audio devices can be hard. FFMPEG is the most popular tool for processing a wide range of media streams, including audio and video. In this tutorial, we'll walk you through installing and running ffmpeg as an input source for the Speechmatics CLI tool.
Install the Speechmatics CLI
Before we move onto using FFMPEG, you'll need to install the speechmatics CLI in your local environment. In order to procede with this installation, you will need Python version >= 3.7. Run the command:
pip install speechmatics-python
Once this command is complete, check the installation was a success by running:
speechmatics -h
Install and Run FFMPEG
Select your OS type from the tabs below and follow the provided guide to run FFMPEG with Speechmatics on your machine.
- Linux
- Mac
- Windows
You will need to be running a version of linux compatible with the apt
installation tool to follow this guide e.g. Ubuntu 22, Debian 11 etc. Alternatively, you can download it from source by following the links here.
Install FFMPEG
Before you start, you'll need a valid install of FFMPEG. You can get this through the following steps.
- Update apt:
sudo apt update
- Install the FFMPEG command and alsa-utils
sudo apt install ffmpeg libasound2 alsa-utils
- Check the installation
ffmpeg -version
Run the CLI Commands
Set up your local CLI environment with the following config command:
speechmatics config set --auth-token {YOUR_API_KEY} --generate-temp-token
Now run speechmatics CLI with FFMPEG microphone input piped in:
ffmpeg -loglevel quiet -f alsa -i hw:0 -f wav - | \
speechmatics transcribe --max-delay 2 -
The above example uses a default device index of zero. You can also choose a specific audio device by running:
arecord -l
To use the sound device of your choice, in the example above, replace hw:0
with hw:{CARD_NO},{DEVICE_NO},{OPTIONAL:SUB_DEVICE_NO}
e.g.:
ffmpeg -loglevel quiet -f alsa -i hw:{CARD_NO},{DEVICE_NO},{OPTIONAL:SUB_DEVICE_NO} -f wav - | \
speechmatics transcribe --max-delay 2 -
Install FFMPEG
Assuming you have Homebrew installed, run:
- Update all your Homebrew formulae:
brew update
- Upgrade your outdated formulae:
brew upgrade
- Install FFMPEG from Homebrew:
brew install ffmpeg
Run the CLI Commands
Set up your local CLI environment with the following config command:
speechmatics config set --auth-token {YOUR_API_KEY} --generate-temp-token
Now run Speechmatics CLI with FFMPEG microphone input piped in:
ffmpeg -loglevel quiet -f avfoundation -i ":default" -f wav - | \
speechmatics transcribe --max-delay 2 -
The above example uses the default device index, which may not always work. To view all devices, run:
ffmpeg -f avfoundation -list_devices true -i ""
Then to select a different device replace :default
in the example above with your device index of choice.
Install FFMPEG
Download the zip file from a recommended source. You can find available sources on the FFMPEG site here.
Once you have successfully downloaded one of these zip files, extract it into a folder. You'll need a file archiver to unzip the file e.g. 7-zip.
Rename the folder that is created by unzipping the archive to
ffmpeg
and move it to a location on you computer where you are happy for it to remain e.g.C:\ffmpeg
Add FFMPEG to your file path. Type system variables into the search bar and click Edit the system environment variables. Under User variables, select Path and click Edit. Click New, and add the bin folder within your FFMPEG folder to the path e.g.
C:\ffmpeg\bin
. Click OK.Open a new CMD or PWD window and verify the installation with the command:
ffmpeg -version
You should see some output with the version, copyright and build information.
Run the CLI Commands
Set up your local CLI environment with the following config command:
speechmatics config set --auth-token {YOUR_API_KEY} --generate-temp-token
List the audio devices available on your Windows machine:
ffmpeg -list_devices true -f dshow -i dummy -
Now run Speechmatics CLI with FFMPEG microphone input piped in. Form the list of devices pick a device and input the id in the command below:
ffmpeg -loglevel quiet -f dshow -i audio="<audio device>" -f wav - | \
speechmatics transcribe --max-delay 2 -