Skip to main content
Integrations and SDKsPipecat

Pipecat quickstart

Build a local voice bot with Speechmatics STT and TTS using Pipecat.

Build a local voice bot with Speechmatics and Pipecat in minutes.

Pipecat is a framework for building real-time voice bots using a pipeline architecture. In this quickstart, you’ll run a local WebRTC server and connect to your bot from your browser.

Features

  • Real-time transcription — Low-latency speech-to-text as users speak
  • Natural text to speech — Give your bot a clear, natural voice
  • Local web client — Test your bot in a browser at http://localhost:7860/client
  • No infrastructure — No cloud deployment or room setup required

Prerequisites

Setup

1. Create project

mkdir voice-agent && cd voice-agent

2. Install dependencies

Create a requirements.txt file:

requirements.txt
pipecat-ai[local-smart-turn-v3,silero,speechmatics,webrtc,openai,runner]
pipecat-ai-small-webrtc-prebuilt
python-dotenv
loguru

Install with uv:

uv venv
source .venv/bin/activate
uv pip install -r requirements.txt

3. Configure environment

Create a .env file:

.env
SPEECHMATICS_API_KEY=your_speechmatics_key
OPENAI_API_KEY=your_openai_key

4. Create your bot

Create a main.py file:

main.py
import os

import aiohttp
from dotenv import load_dotenv
from loguru import logger

from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.speechmatics.stt import SpeechmaticsSTTService
from pipecat.services.speechmatics.tts import SpeechmaticsTTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.turns.user_stop.turn_analyzer_user_turn_stop_strategy import (
TurnAnalyzerUserTurnStopStrategy,
)
from pipecat.turns.user_turn_strategies import UserTurnStrategies

load_dotenv(override=True)


async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info("Starting bot")

async with aiohttp.ClientSession() as session:
stt = SpeechmaticsSTTService(
api_key=os.getenv("SPEECHMATICS_API_KEY"),
params=SpeechmaticsSTTService.InputParams(
turn_detection_mode=SpeechmaticsSTTService.TurnDetectionMode.EXTERNAL,
),
)

llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o-mini",
)

tts = SpeechmaticsTTSService(
api_key=os.getenv("SPEECHMATICS_API_KEY"),
voice_id="sarah",
aiohttp_session=session,
)

messages = [
{
"role": "system",
"content": "You are a helpful voice assistant. Be concise and friendly.",
},
]

context = LLMContext(messages)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
user_turn_strategies=UserTurnStrategies(
stop=[
TurnAnalyzerUserTurnStopStrategy(
turn_analyzer=LocalSmartTurnAnalyzerV3()
)
]
),
),
)

pipeline = Pipeline(
[
transport.input(),
stt,
user_aggregator,
llm,
tts,
transport.output(),
assistant_aggregator,
]
)

task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
)

@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info("Client connected")
await task.queue_frames([LLMRunFrame()])

@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info("Client disconnected")
await task.cancel()

runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)


async def bot(runner_args: RunnerArguments):
transport_params = {
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
}

transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)


if __name__ == "__main__":
from pipecat.runner.run import main

main()

5. Run your bot

python main.py

Open http://localhost:7860/client in your browser and allow microphone access.

The first run can take a little longer while dependencies and models load.

Next steps