Build a conversational AI web app with Next.js and Flow
In this guide, we will walk you through the process of building a conversational AI web application using Next.js and Flow. You will learn how to set up your development environment, create a Next.js project, integrate Flow and implement a simple conversational AI feature.
A full and more comprehensive example can be found here.
Prerequisites
Before getting started, ensure you have:
- Node.js 18.18 or later
Project Setup
Start by creating a fresh Next.js app:
npx create-next-app@latest
You'll see the following prompts for which you can choose the following answers:
What is your project named? … nextjs-flow-guide
Would you like to use TypeScript? … Yes
Would you like to use ESLint? … Yes
Would you like to use Tailwind CSS? … Yes
Would you like your code inside a `src/` directory? … No
Would you like to use App Router? (recommended) … Yes
Would you like to use Turbopack for `next dev`? … Yes
Would you like to customize the import alias (`@/*` by default)? … No
After the prompts, create-next-app
will create a folder with your project name and install the required dependencies.
Let's install our main dependencies:
# Speechmatics Flow client for react based apps
npm i @speechmatics/flow-client-react
# Speechmatics browser audio input contains an audio worklet
npm i @speechmatics/browser-audio-input
# Speechmatics browser audio input for react based apps
npm i @speechmatics/browser-audio-input-react
# Package for playing PCM audio in the browser
npm i @speechmatics/web-pcm-player-react
# Speechmatics auth package
npm i @speechmatics/auth
We are going to install some development dependencies as well:
# UI library
npm i daisyui -D
# Plugin that allows us to configure Next.js in such a way that it can copy
# to the public directory the pcm audio worklet file from @speechmatics/browser-audio-input
npm i copy-webpack-plugin
Now let's configure Next.js so that it serves the pcm audio worklet from the public directory.
Edit the file next.config.ts
and leave it as follows:
import path from "node:path";
import CopyWebpackPlugin from "copy-webpack-plugin";
import type { NextConfig } from "next";
const nextConfig: NextConfig = {
webpack: (config, { isServer }) => {
// Use CopyWebpackPlugin to copy the file to the public directory
if (!isServer) {
config.plugins.push(
new CopyWebpackPlugin({
patterns: [
{
from: path.resolve(
__dirname,
"node_modules/@speechmatics/browser-audio-input/dist/pcm-audio-worklet.min.js"
),
to: path.resolve(__dirname, "public/js/[name][ext]"),
},
],
})
);
}
return config;
},
};
export default nextConfig;
Initial app structure
Now we are going to build a minimal app structure and configure the Flow client and browser audio input.
Edit /app/page.tsx
import { fetchPersonas, FlowProvider } from "@speechmatics/flow-client-react";
import { PCMAudioRecorderProvider } from "@speechmatics/browser-audio-input-react";
export default async function Home() {
const personas = await fetchPersonas();
return (
// Two context providers:
// 1. For the audio recorder (see https://github.com/speechmatics/speechmatics-js-sdk/blob/main/packages/browser-audio-input-react/README.md)
// 2. For the Flow API client (see https://github.com/speechmatics/speechmatics-js-sdk/blob/main/packages/flow-client-react/README.md)
<PCMAudioRecorderProvider workletScriptURL="/js/pcm-audio-worklet.min.js">
<FlowProvider
appId="nextjs-example"
audioBufferingMs={500} // How many milliseconds of agent audio to buffer before playing back
websocketBinaryType="arraybuffer" // This is optional, but does lead to better audio performance, particularly on Firefox
>
<div className="container p-4 mx-auto max-xl:container">
<h1 className="text-2xl font-bold">
Speechmatics ❤️ NextJS Flow Example
</h1>
{/* Our custom components here will have access to flow client and browser audio functionality */}
</div>
</FlowProvider>
</PCMAudioRecorderProvider>
);
}
We have the basic skeleton setup. We'll place all our custom components wrapped by the flow and pcm audio recorder provider, this way we can have access to the functionality provided by them from any component.
Flow integration
Let's complete our app by adding the missing functionality. We need to establish a connection with Flow backend and send microphone audio and playback audio responses.
To get things organised lets create a couple of folders in the root directory of the project:
/components
: we'll put the files implementing our custom components here./hooks
: custom hooks implementations will live here.
Custom hooks
It's time to add some custom hooks to the app/hooks
folder.
/hooks/useFlowWithBrowserAudio.ts
1"use client";
2import { useCallback, useState } from "react";
3import {
4 type AgentAudioEvent,
5 useFlow,
6 useFlowEventListener,
7} from "@speechmatics/flow-client-react";
8import { getJWT } from "./actions";
9import {
10 usePCMAudioListener,
11 usePCMAudioRecorder,
12} from "@speechmatics/browser-audio-input-react";
13import { usePCMAudioPlayer } from "@speechmatics/web-pcm-player-react";
14
15const RECORDING_SAMPLE_RATE = 16_000;
16
17// Hook to set up two way audio between the browser and Flow
18export function useFlowWithBrowserAudio() {
19 const { startConversation, endConversation, sendAudio } = useFlow();
20 const { startRecording, stopRecording } = usePCMAudioRecorder();
21 const [audioContext, setAudioContext] = useState<AudioContext>();
22
23 // Normally we would be able to use the same audio context for playback and recording,
24 // but there is a bug in Firefox which prevents capturing microphone audio at 16,000 Hz.
25 // So in Firefox, we need to use a separate audio context for playback.
26 const [playbackAudioContext, setPlaybackAudioContext] =
27 useState<AudioContext>();
28
29 const { playAudio } = usePCMAudioPlayer(playbackAudioContext);
30
31 // Send audio to Flow when we receive it from the active input device
32 usePCMAudioListener((audio: Float32Array) => {
33 sendAudio(audio.buffer);
34 });
35
36 // Play back audio when we receive it from flow
37 useFlowEventListener(
38 "agentAudio",
39 useCallback(
40 ({ data }: AgentAudioEvent) => {
41 playAudio(data);
42 },
43 [playAudio]
44 )
45 );
46
47 const startSession = useCallback(
48 async ({
49 personaId,
50 deviceId,
51 }: {
52 personaId: string;
53 deviceId: string;
54 }) => {
55 const jwt = await getJWT("flow");
56
57 const isFirefox = navigator.userAgent.includes("Firefox");
58 const audioContext = new AudioContext({
59 sampleRate: isFirefox ? undefined : RECORDING_SAMPLE_RATE,
60 });
61 setAudioContext(audioContext);
62
63 const playbackAudioContext = isFirefox
64 ? new AudioContext({ sampleRate: 16_000 })
65 : audioContext;
66 setPlaybackAudioContext(playbackAudioContext);
67
68 await startConversation(jwt, {
69 config: {
70 template_id: personaId,
71 template_variables: {
72 // We can set up any template variables here
73 },
74 },
75 audioFormat: {
76 type: "raw",
77 encoding: "pcm_f32le",
78 sample_rate: audioContext.sampleRate,
79 },
80 });
81
82 await startRecording({
83 deviceId,
84 audioContext,
85 });
86 },
87 [startConversation, startRecording]
88 );
89
90 const closeAudioContext = useCallback(() => {
91 if (audioContext?.state !== "closed") {
92 audioContext?.close();
93 }
94 setAudioContext(undefined);
95 if (playbackAudioContext?.state !== "closed") {
96 playbackAudioContext?.close();
97 }
98 setPlaybackAudioContext(undefined);
99 }, [audioContext, playbackAudioContext]);
100
101 const stopSession = useCallback(async () => {
102 endConversation();
103 stopRecording();
104 closeAudioContext();
105 }, [endConversation, stopRecording, closeAudioContext]);
106
107 return { startSession, stopSession };
108}
109
The hook defined above is using a function to retrieve a JWT token. A JWT is necessary to talk to Flow API. This temporary token should be obtained from a server and not from client side code. The reason for it is that we need an API Key for retrieving JWT's and we don't want to expose API keys on client side code.
We can keep this functionality on the server side with Next.js. Create a file in the root directory of the project named actions.ts
with the following contents (note that we are adding 'use server').
More information about calling server actions from client components can be found here.
"use server";
import { createSpeechmaticsJWT } from "@speechmatics/auth";
export async function getJWT(type: "flow" | "rt") {
const apiKey = process.env.API_KEY;
if (!apiKey) {
throw new Error("Please set the API_KEY environment variable");
}
return createSpeechmaticsJWT({ type, apiKey, ttl: 60 });
}
As mentioned above this code needs an API Key that we can pass to our Next.js app through the API_KEY
env variable. Let's create a .env
file at the root of the project with the following content:
API_KEY='YOUR-API-KEY-GOES-HERE'
API Keys can be retrieved from Speechmatics user portal
Custom components
We'll also add some components that will be rendered within our existing skeleton app.
/components/MicrophoneSelect.tsx
"use client";
import { useAudioDevices } from "@speechmatics/browser-audio-input-react";
export function MicrophoneSelect({ disabled }: { disabled?: boolean }) {
const devices = useAudioDevices();
switch (devices.permissionState) {
case "prompt":
return (
<Select
label="Enable mic permissions"
onClick={devices.promptPermissions}
onKeyDown={devices.promptPermissions}
/>
);
case "prompting":
return <Select label="Enable mic permissions" aria-busy="true" />;
case "granted": {
return (
<Select label="Select audio device" name="deviceId" disabled={disabled}>
{devices.deviceList.map((d) => (
<option key={d.deviceId} value={d.deviceId}>
{d.label}
</option>
))}
</Select>
);
}
case "denied":
return <Select label="Enable mic permissions" disabled />;
default:
devices satisfies never;
return null;
}
}
interface SelectProps extends React.SelectHTMLAttributes<HTMLSelectElement> {
label: string;
children?: React.ReactNode;
}
export const Select = ({
label,
children,
className,
...props
}: SelectProps) => (
<label className="form-control w-full max-w-xs">
<div className="label">
<span className="font-semibold">{label}</span>
</div>
<select className={`select select-bordered ${className || ""}`} {...props}>
{children}
</select>
</label>
);
/components/Controls.tsx
"use client";
import { type FormEventHandler, useCallback, useMemo } from "react";
import { MicrophoneSelect, Select } from "./MicrophoneSelect";
import { useFlow } from "@speechmatics/flow-client-react";
import { useFlowWithBrowserAudio } from "../hooks/useFlowWithBrowserAudio";
export function Controls({
personas,
}: {
personas: Record<string, { name: string }>;
}) {
const { socketState, sessionId } = useFlow();
const { startSession, stopSession } = useFlowWithBrowserAudio();
const handleSubmit = useCallback<FormEventHandler<HTMLFormElement>>(
(e) => {
e.preventDefault();
const formData = new FormData(e.target as HTMLFormElement);
const personaId = formData.get("personaId")?.toString();
if (!personaId) throw new Error("No persona selected!");
const deviceId = formData.get("deviceId")?.toString();
if (!deviceId) throw new Error("No device selected!");
startSession({ personaId, deviceId });
},
[startSession]
);
const conversationButton = useMemo(() => {
if (socketState === "open" && sessionId) {
return (
<button
type="button"
className="flex-1 btn btn-primary text-md"
onClick={stopSession}
>
End conversation
</button>
);
}
if (
socketState === "connecting" ||
socketState === "closing" ||
(socketState === "open" && !sessionId)
) {
return (
<button
type="button"
className="flex-1 btn btn-primary text-md"
disabled
>
<span className="loading loading-spinner" />
</button>
);
}
return (
<button type="submit" className="flex-1 btn btn-primary text-md">
Start conversation
</button>
);
}, [socketState, sessionId, stopSession]);
return (
<form onSubmit={handleSubmit}>
<MicrophoneSelect />
<Select label="Select a persona" name="personaId">
{Object.entries(personas).map(([id, persona]) => (
<option key={id} value={id} label={persona.name} />
))}
</Select>
<div>{conversationButton}</div>
</form>
);
}
The Controls
component is rendering a form that allows us to choose a persona and an input device for starting a conversation.
It's also using the hooks that we have created before for accessing input devices and playing pcm audio.
We just need to include the <Controls/>
component in the skeleton we had already created in the main page.
app/page.tsx
import { FlowProvider, fetchPersonas } from "@speechmatics/flow-client-react";
import { PCMAudioRecorderProvider } from "@speechmatics/browser-audio-input-react";
import { Controls } from "../components/Controls";
export default async function Home() {
const personas = await fetchPersonas();
return (
// Two context providers:
// 1. For the audio recorder (see https://github.com/speechmatics/speechmatics-js-sdk/blob/main/packages/browser-audio-input-react/README.md)
// 2. For the Flow API client (see https://github.com/speechmatics/speechmatics-js-sdk/blob/main/packages/flow-client-react/README.md)
<PCMAudioRecorderProvider workletScriptURL="/js/pcm-audio-worklet.min.js">
<FlowProvider
appId="nextjs-example"
audioBufferingMs={500}
websocketBinaryType="arraybuffer" // This is optional, but does lead to better audio performance, particularly on Firefox
>
<div className="container p-4 mx-auto max-xl:container">
<h1 className="text-2xl font-bold mb-4">
Speechmatics ❤️ NextJS Flow Example
</h1>
<Controls personas={personas} />
</div>
</FlowProvider>
</PCMAudioRecorderProvider>
);
}
Running the app
First of all let's build the app so that the copy-webpack-plugin
configuration that we created earlier can run and copy the pcm-audio-worklet.min.js
file in the public directory.
npm run build
Now we can run the app by starting it with:
npm run start
or by running it in dev mode:
npm run dev
Additional resources
This guide covers the minimum steps to get up and running using the Flow client library. A full example showcasing additional features of the Flow API like displaying the transcript of the conversation can be found here.
Dive deeper into the tools used in this guide: