Integrations and SDKsPipecat

Pipecat quickstart

Build a local voice bot with Speechmatics STT and TTS using Pipecat.

Build a local voice bot with Speechmatics and Pipecat in minutes.

Pipecat is a framework for building real-time voice bots using a pipeline architecture. In this quickstart, you’ll run a local WebRTC server and connect to your bot from your browser.

Features

Real-time transcription — Low-latency speech-to-text as users speak
Natural text to speech — Give your bot a clear, natural voice
Local web client — Test your bot in a browser at http://localhost:7860/client
No infrastructure — No cloud deployment or room setup required

Prerequisites

Python 3.10+
Speechmatics API key
OpenAI API key (for the LLM)

Setup

1. Create project

mkdir voice-agent && cd voice-agent

2. Install dependencies

Create a requirements.txt file:

requirements.txt
pipecat-ai[local-smart-turn-v3,silero,speechmatics,webrtc,openai,runner]
pipecat-ai-small-webrtc-prebuilt
python-dotenv
loguru

Install with uv:

uv venv
source .venv/bin/activate
uv pip install -r requirements.txt

3. Configure environment

Create a .env file:

.env
SPEECHMATICS_API_KEY=your_speechmatics_key
OPENAI_API_KEY=your_openai_key

4. Create your bot

Create a main.py file:

main.py
import os

import aiohttp
from dotenv import load_dotenv
from loguru import logger

from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
    LLMContextAggregatorPair,
    LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.speechmatics.stt import SpeechmaticsSTTService
from pipecat.services.speechmatics.tts import SpeechmaticsTTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.turns.user_stop.turn_analyzer_user_turn_stop_strategy import (
    TurnAnalyzerUserTurnStopStrategy,
)
from pipecat.turns.user_turn_strategies import UserTurnStrategies

load_dotenv(override=True)


async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
    logger.info("Starting bot")

    async with aiohttp.ClientSession() as session:
        stt = SpeechmaticsSTTService(
            api_key=os.getenv("SPEECHMATICS_API_KEY"),
            params=SpeechmaticsSTTService.InputParams(
                turn_detection_mode=SpeechmaticsSTTService.TurnDetectionMode.EXTERNAL,
            ),
        )

        llm = OpenAILLMService(
            api_key=os.getenv("OPENAI_API_KEY"),
            model="gpt-4o-mini",
        )

        tts = SpeechmaticsTTSService(
            api_key=os.getenv("SPEECHMATICS_API_KEY"),
            voice_id="sarah",
            aiohttp_session=session,
        )

        messages = [
            {
                "role": "system",
                "content": "You are a helpful voice assistant. Be concise and friendly.",
            },
        ]

        context = LLMContext(messages)
        user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
            context,
            user_params=LLMUserAggregatorParams(
                vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
                user_turn_strategies=UserTurnStrategies(
                    stop=[
                        TurnAnalyzerUserTurnStopStrategy(
                            turn_analyzer=LocalSmartTurnAnalyzerV3()
                        )
                    ]
                ),
            ),
        )

        pipeline = Pipeline(
            [
                transport.input(),
                stt,
                user_aggregator,
                llm,
                tts,
                transport.output(),
                assistant_aggregator,
            ]
        )

        task = PipelineTask(
            pipeline,
            params=PipelineParams(
                enable_metrics=True,
                enable_usage_metrics=True,
            ),
        )

        @transport.event_handler("on_client_connected")
        async def on_client_connected(transport, client):
            logger.info("Client connected")
            await task.queue_frames([LLMRunFrame()])

        @transport.event_handler("on_client_disconnected")
        async def on_client_disconnected(transport, client):
            logger.info("Client disconnected")
            await task.cancel()

        runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
        await runner.run(task)


async def bot(runner_args: RunnerArguments):
    transport_params = {
        "webrtc": lambda: TransportParams(
            audio_in_enabled=True,
            audio_out_enabled=True,
        ),
    }

    transport = await create_transport(runner_args, transport_params)
    await run_bot(transport, runner_args)


if __name__ == "__main__":
    from pipecat.runner.run import main

    main()

5. Run your bot

python main.py

Open http://localhost:7860/client in your browser and allow microphone access.

The first run can take a little longer while dependencies and models load.

Next steps

Speech to text — Configure diarization, turn detection, and more
Text to speech — Choose voices and adjust settings
Speechmatics Academy — Full working examples
Pipecat quickstart — Learn more patterns and deployment options

Features​

Prerequisites​

Setup​

1. Create project​

2. Install dependencies​

3. Configure environment​

4. Create your bot​

5. Run your bot​

Next steps​