Skip to main content

Build a conversational AI web app with Next.js and Flow

In this guide, we will walk you through the process of building a conversational AI web application using Next.js and Flow. You will learn how to set up your development environment, create a Next.js project, integrate Flow and implement a simple conversational AI feature.

A full and more comprehensive example can be found here.

Prerequisites

Before getting started, ensure you have:

Project Setup

Start by creating a fresh Next.js app:

npx create-next-app@latest

You'll see the following prompts for which you can choose the following answers:

What is your project named? … nextjs-flow-guide
Would you like to use TypeScript? … Yes
Would you like to use ESLint? … Yes
Would you like to use Tailwind CSS? … Yes
Would you like your code inside a `src/` directory? … No
Would you like to use App Router? (recommended) … Yes
Would you like to use Turbopack for `next dev`? … Yes
Would you like to customize the import alias (`@/*` by default)? … No

After the prompts, create-next-app will create a folder with your project name and install the required dependencies.

Let's install our main dependencies:

# Speechmatics Flow client for react based apps
npm i @speechmatics/flow-client-react

# Speechmatics browser audio input contains an audio worklet
npm i @speechmatics/browser-audio-input

# Speechmatics browser audio input for react based apps
npm i @speechmatics/browser-audio-input-react

# Package for playing PCM audio in the browser
npm i @speechmatics/web-pcm-player-react

# Speechmatics auth package
npm i @speechmatics/auth

We are going to install some development dependencies as well:

# UI library
npm i daisyui -D

# Plugin that allows us to configure Next.js in such a way that it can copy
# to the public directory the pcm audio worklet file from @speechmatics/browser-audio-input
npm i copy-webpack-plugin

Now let's configure Next.js so that it serves the pcm audio worklet from the public directory. Edit the file next.config.ts and leave it as follows:

import path from "node:path";
import CopyWebpackPlugin from "copy-webpack-plugin";
import type { NextConfig } from "next";

const nextConfig: NextConfig = {
  webpack: (config, { isServer }) => {
    // Use CopyWebpackPlugin to copy the file to the public directory
    if (!isServer) {
      config.plugins.push(
        new CopyWebpackPlugin({
          patterns: [
            {
              from: path.resolve(
                __dirname,
                "node_modules/@speechmatics/browser-audio-input/dist/pcm-audio-worklet.min.js"
              ),
              to: path.resolve(__dirname, "public/js/[name][ext]"),
            },
          ],
        })
      );
    }

    return config;
  },
};

export default nextConfig;

Initial app structure

Now we are going to build a minimal app structure and configure the Flow client and browser audio input.

Edit /app/page.tsx

import { fetchPersonas, FlowProvider } from "@speechmatics/flow-client-react";
import { PCMAudioRecorderProvider } from "@speechmatics/browser-audio-input-react";

export default async function Home() {
  const personas = await fetchPersonas();

  return (
    // Two context providers:
    // 1. For the audio recorder (see https://github.com/speechmatics/speechmatics-js-sdk/blob/main/packages/browser-audio-input-react/README.md)
    // 2. For the Flow API client (see https://github.com/speechmatics/speechmatics-js-sdk/blob/main/packages/flow-client-react/README.md)
    <PCMAudioRecorderProvider workletScriptURL="/js/pcm-audio-worklet.min.js">
      <FlowProvider
        appId="nextjs-example"
        audioBufferingMs={500} // How many milliseconds of agent audio to buffer before playing back
        websocketBinaryType="arraybuffer" // This is optional, but does lead to better audio performance, particularly on Firefox
      >
        <div className="container p-4 mx-auto max-xl:container">
          <h1 className="text-2xl font-bold">
            Speechmatics ❤️ NextJS Flow Example
          </h1>
          {/* Our custom components here will have access to flow client and browser audio functionality */}
        </div>
      </FlowProvider>
    </PCMAudioRecorderProvider>
  );
}

We have the basic skeleton setup. We'll place all our custom components wrapped by the flow and pcm audio recorder provider, this way we can have access to the functionality provided by them from any component.

Flow integration

Let's complete our app by adding the missing functionality. We need to establish a connection with Flow backend and send microphone audio and playback audio responses.

To get things organised lets create a couple of folders in the root directory of the project:

  1. /components: we'll put the files implementing our custom components here.
  2. /hooks: custom hooks implementations will live here.

Custom hooks

It's time to add some custom hooks to the app/hooks folder.

/hooks/useFlowWithBrowserAudio.ts

1"use client";
2import { useCallback, useState } from "react";
3import {
4  type AgentAudioEvent,
5  useFlow,
6  useFlowEventListener,
7} from "@speechmatics/flow-client-react";
8import { getJWT } from "./actions";
9import {
10  usePCMAudioListener,
11  usePCMAudioRecorder,
12} from "@speechmatics/browser-audio-input-react";
13import { usePCMAudioPlayer } from "@speechmatics/web-pcm-player-react";
14
15const RECORDING_SAMPLE_RATE = 16_000;
16
17// Hook to set up two way audio between the browser and Flow
18export function useFlowWithBrowserAudio() {
19  const { startConversation, endConversation, sendAudio } = useFlow();
20  const { startRecording, stopRecording } = usePCMAudioRecorder();
21  const [audioContext, setAudioContext] = useState<AudioContext>();
22
23  // Normally we would be able to use the same audio context for playback and recording,
24  // but there is a bug in Firefox which prevents capturing microphone audio at 16,000 Hz.
25  // So in Firefox, we need to use a separate audio context for playback.
26  const [playbackAudioContext, setPlaybackAudioContext] =
27    useState<AudioContext>();
28
29  const { playAudio } = usePCMAudioPlayer(playbackAudioContext);
30
31  // Send audio to Flow when we receive it from the active input device
32  usePCMAudioListener((audio: Float32Array) => {
33    sendAudio(audio.buffer);
34  });
35
36  // Play back audio when we receive it from flow
37  useFlowEventListener(
38    "agentAudio",
39    useCallback(
40      ({ data }: AgentAudioEvent) => {
41        playAudio(data);
42      },
43      [playAudio]
44    )
45  );
46
47  const startSession = useCallback(
48    async ({
49      personaId,
50      deviceId,
51    }: {
52      personaId: string;
53      deviceId: string;
54    }) => {
55      const jwt = await getJWT("flow");
56
57      const isFirefox = navigator.userAgent.includes("Firefox");
58      const audioContext = new AudioContext({
59        sampleRate: isFirefox ? undefined : RECORDING_SAMPLE_RATE,
60      });
61      setAudioContext(audioContext);
62
63      const playbackAudioContext = isFirefox
64        ? new AudioContext({ sampleRate: 16_000 })
65        : audioContext;
66      setPlaybackAudioContext(playbackAudioContext);
67
68      await startConversation(jwt, {
69        config: {
70          template_id: personaId,
71          template_variables: {
72            // We can set up any template variables here
73          },
74        },
75        audioFormat: {
76          type: "raw",
77          encoding: "pcm_f32le",
78          sample_rate: audioContext.sampleRate,
79        },
80      });
81
82      await startRecording({
83        deviceId,
84        audioContext,
85      });
86    },
87    [startConversation, startRecording]
88  );
89
90  const closeAudioContext = useCallback(() => {
91    if (audioContext?.state !== "closed") {
92      audioContext?.close();
93    }
94    setAudioContext(undefined);
95    if (playbackAudioContext?.state !== "closed") {
96      playbackAudioContext?.close();
97    }
98    setPlaybackAudioContext(undefined);
99  }, [audioContext, playbackAudioContext]);
100
101  const stopSession = useCallback(async () => {
102    endConversation();
103    stopRecording();
104    closeAudioContext();
105  }, [endConversation, stopRecording, closeAudioContext]);
106
107  return { startSession, stopSession };
108}
109

The hook defined above is using a function to retrieve a JWT token. A JWT is necessary to talk to Flow API. This temporary token should be obtained from a server and not from client side code. The reason for it is that we need an API Key for retrieving JWT's and we don't want to expose API keys on client side code.

We can keep this functionality on the server side with Next.js. Create a file in the root directory of the project named actions.ts with the following contents (note that we are adding 'use server'). More information about calling server actions from client components can be found here.

"use server";

import { createSpeechmaticsJWT } from "@speechmatics/auth";

export async function getJWT(type: "flow" | "rt") {
  const apiKey = process.env.API_KEY;
  if (!apiKey) {
    throw new Error("Please set the API_KEY environment variable");
  }

  return createSpeechmaticsJWT({ type, apiKey, ttl: 60 });
}

As mentioned above this code needs an API Key that we can pass to our Next.js app through the API_KEY env variable. Let's create a .env file at the root of the project with the following content:

API_KEY='YOUR-API-KEY-GOES-HERE'

API Keys can be retrieved from Speechmatics user portal

Custom components

We'll also add some components that will be rendered within our existing skeleton app.

/components/MicrophoneSelect.tsx

"use client";
import { useAudioDevices } from "@speechmatics/browser-audio-input-react";

export function MicrophoneSelect({ disabled }: { disabled?: boolean }) {
  const devices = useAudioDevices();

  switch (devices.permissionState) {
    case "prompt":
      return (
        <Select
          label="Enable mic permissions"
          onClick={devices.promptPermissions}
          onKeyDown={devices.promptPermissions}
        />
      );
    case "prompting":
      return <Select label="Enable mic permissions" aria-busy="true" />;
    case "granted": {
      return (
        <Select label="Select audio device" name="deviceId" disabled={disabled}>
          {devices.deviceList.map((d) => (
            <option key={d.deviceId} value={d.deviceId}>
              {d.label}
            </option>
          ))}
        </Select>
      );
    }
    case "denied":
      return <Select label="Enable mic permissions" disabled />;
    default:
      devices satisfies never;
      return null;
  }
}

interface SelectProps extends React.SelectHTMLAttributes<HTMLSelectElement> {
  label: string;
  children?: React.ReactNode;
}

export const Select = ({
  label,
  children,
  className,
  ...props
}: SelectProps) => (
  <label className="form-control w-full max-w-xs">
    <div className="label">
      <span className="font-semibold">{label}</span>
    </div>
    <select className={`select select-bordered ${className || ""}`} {...props}>
      {children}
    </select>
  </label>
);

/components/Controls.tsx

"use client";
import { type FormEventHandler, useCallback, useMemo } from "react";
import { MicrophoneSelect, Select } from "./MicrophoneSelect";
import { useFlow } from "@speechmatics/flow-client-react";
import { useFlowWithBrowserAudio } from "../hooks/useFlowWithBrowserAudio";

export function Controls({
  personas,
}: {
  personas: Record<string, { name: string }>;
}) {
  const { socketState, sessionId } = useFlow();
  const { startSession, stopSession } = useFlowWithBrowserAudio();
  const handleSubmit = useCallback<FormEventHandler<HTMLFormElement>>(
    (e) => {
      e.preventDefault();
      const formData = new FormData(e.target as HTMLFormElement);
      const personaId = formData.get("personaId")?.toString();
      if (!personaId) throw new Error("No persona selected!");
      const deviceId = formData.get("deviceId")?.toString();
      if (!deviceId) throw new Error("No device selected!");

      startSession({ personaId, deviceId });
    },
    [startSession]
  );

  const conversationButton = useMemo(() => {
    if (socketState === "open" && sessionId) {
      return (
        <button
          type="button"
          className="flex-1 btn btn-primary text-md"
          onClick={stopSession}
        >
          End conversation
        </button>
      );
    }
    if (
      socketState === "connecting" ||
      socketState === "closing" ||
      (socketState === "open" && !sessionId)
    ) {
      return (
        <button
          type="button"
          className="flex-1 btn btn-primary text-md"
          disabled
        >
          <span className="loading loading-spinner" />
        </button>
      );
    }
    return (
      <button type="submit" className="flex-1 btn btn-primary text-md">
        Start conversation
      </button>
    );
  }, [socketState, sessionId, stopSession]);

  return (
    <form onSubmit={handleSubmit}>
      <MicrophoneSelect />
      <Select label="Select a persona" name="personaId">
        {Object.entries(personas).map(([id, persona]) => (
          <option key={id} value={id} label={persona.name} />
        ))}
      </Select>
      <div>{conversationButton}</div>
    </form>
  );
}

The Controls component is rendering a form that allows us to choose a persona and an input device for starting a conversation. It's also using the hooks that we have created before for accessing input devices and playing pcm audio.

We just need to include the <Controls/> component in the skeleton we had already created in the main page.

app/page.tsx

import { FlowProvider, fetchPersonas } from "@speechmatics/flow-client-react";
import { PCMAudioRecorderProvider } from "@speechmatics/browser-audio-input-react";
import { Controls } from "../components/Controls";

export default async function Home() {
  const personas = await fetchPersonas();
  return (
    // Two context providers:
    // 1. For the audio recorder (see https://github.com/speechmatics/speechmatics-js-sdk/blob/main/packages/browser-audio-input-react/README.md)
    // 2. For the Flow API client (see https://github.com/speechmatics/speechmatics-js-sdk/blob/main/packages/flow-client-react/README.md)
    <PCMAudioRecorderProvider workletScriptURL="/js/pcm-audio-worklet.min.js">
      <FlowProvider
        appId="nextjs-example"
        audioBufferingMs={500}
        websocketBinaryType="arraybuffer" // This is optional, but does lead to better audio performance, particularly on Firefox
      >
        <div className="container p-4 mx-auto max-xl:container">
          <h1 className="text-2xl font-bold mb-4">
            Speechmatics ❤️ NextJS Flow Example
          </h1>
          <Controls personas={personas} />
        </div>
      </FlowProvider>
    </PCMAudioRecorderProvider>
  );
}

Running the app

First of all let's build the app so that the copy-webpack-plugin configuration that we created earlier can run and copy the pcm-audio-worklet.min.js file in the public directory.

npm run build

Now we can run the app by starting it with:

npm run start

or by running it in dev mode:

npm run dev

Additional resources

This guide covers the minimum steps to get up and running using the Flow client library. A full example showcasing additional features of the Flow API like displaying the transcript of the conversation can be found here.

Dive deeper into the tools used in this guide:

Speechmatics JS SDK