Build a conversational AI web app with Next.js and Flow
In this guide, we will walk you through the process of building a conversational AI web application using Next.js and Flow. You will learn how to set up your development environment, create a Next.js project, integrate Flow and implement a simple conversational AI feature.
A full and more comprehensive example can be found here.
Prerequisites
Before getting started, ensure you have:
- Node.js 18.18 or later
Project Setup
Start by creating a fresh Next.js app:
npx create-next-app@latest nextjs-flow-guide
It will ask you some questions about how to build your project. We'll follow the default suggestions for this tutorial.
Once the previous command finishes setting up the project we can install our main dependencies:
cd nextjs-flow-guide
# Speechmatics Flow client for react based apps
npm i @speechmatics/flow-client-react
# Speechmatics browser audio input contains an audio worklet
npm i @speechmatics/browser-audio-input
# Speechmatics browser audio input for react based apps
npm i @speechmatics/browser-audio-input-react
# Package for playing PCM audio in the browser
npm i @speechmatics/web-pcm-player-react
# Speechmatics auth package
npm i @speechmatics/auth
We are going to install some development dependencies as well:
# UI library
npm i daisyui -D
# Plugin that allows us to configure Next.js in such a way that it can copy
# to the public directory the pcm audio worklet file from @speechmatics/browser-audio-input
npm i copy-webpack-plugin
Now let's configure Next.js so that it serves the pcm audio worklet from the public directory.
Edit the file next.config.ts
and leave it as follows:
import path from "node:path";
import CopyWebpackPlugin from "copy-webpack-plugin";
import type { NextConfig } from "next";
const nextConfig: NextConfig = {
webpack: (config, { isServer }) => {
// Use CopyWebpackPlugin to copy the file to the public directory
if (!isServer) {
config.plugins.push(
new CopyWebpackPlugin({
patterns: [
{
from: path.resolve(
__dirname,
"node_modules/@speechmatics/browser-audio-input/dist/pcm-audio-worklet.min.js"
),
to: path.resolve(__dirname, "public/js/[name][ext]"),
},
],
})
);
}
return config;
},
};
export default nextConfig;
Initial app structure
Now we are going to build a minimal app structure and configure the Flow client and browser audio input.
Replace the contents of /app/page.tsx
with:
import { fetchPersonas, FlowProvider } from "@speechmatics/flow-client-react";
import { PCMAudioRecorderProvider } from "@speechmatics/browser-audio-input-react";
export default async function Home() {
const personas = await fetchPersonas();
return (
// Two context providers:
// 1. For the audio recorder (see https://github.com/speechmatics/speechmatics-js-sdk/blob/main/packages/browser-audio-input-react/README.md)
// 2. For the Flow API client (see https://github.com/speechmatics/speechmatics-js-sdk/blob/main/packages/flow-client-react/README.md)
<PCMAudioRecorderProvider workletScriptURL="/js/pcm-audio-worklet.min.js">
<FlowProvider
appId="nextjs-example"
audioBufferingMs={500} // How many milliseconds of agent audio to buffer before playing back
websocketBinaryType="arraybuffer" // This is optional, but does lead to better audio performance, particularly on Firefox
>
<div className="container p-4 mx-auto max-xl:container">
<h1 className="text-2xl font-bold">
Speechmatics ❤️ NextJS Flow Example
</h1>
{/* Our custom components here will have access to flow client and browser audio functionality */}
</div>
</FlowProvider>
</PCMAudioRecorderProvider>
);
}
We have the basic skeleton setup. We'll place all our custom components wrapped by the flow and pcm audio recorder provider, this way we can have access to the functionality provided by them from any component.
Flow integration
Let's complete our app by adding the missing functionality. We need to establish a connection with Flow backend and send microphone audio and playback audio responses.
To get things organised let's create a couple of folders in the root directory of the project:
/components
: we'll put the files implementing our custom components here./hooks
: custom hooks implementations will live here.
Custom hooks
It's time to add some custom hooks to the app/hooks
folder.
/hooks/useFlowWithBrowserAudio.ts
1"use client";
2import { useCallback, useState } from "react";
3import {
4 type AgentAudioEvent,
5 useFlow,
6 useFlowEventListener,
7} from "@speechmatics/flow-client-react";
8import { getJWT } from "../actions";
9import {
10 usePCMAudioListener,
11 usePCMAudioRecorder,
12} from "@speechmatics/browser-audio-input-react";
13import { usePCMAudioPlayer } from "@speechmatics/web-pcm-player-react";
14
15const RECORDING_SAMPLE_RATE = 16_000;
16
17// Hook to set up two way audio between the browser and Flow
18export function useFlowWithBrowserAudio() {
19 const { startConversation, endConversation, sendAudio } = useFlow();
20 const { startRecording, stopRecording } = usePCMAudioRecorder();
21 const [audioContext, setAudioContext] = useState<AudioContext>();
22
23 // Normally we would be able to use the same audio context for playback and recording,
24 // but there is a bug in Firefox which prevents capturing microphone audio at 16,000 Hz.
25 // So in Firefox, we need to use a separate audio context for playback.
26 const [playbackAudioContext, setPlaybackAudioContext] =
27 useState<AudioContext>();
28
29 const { playAudio } = usePCMAudioPlayer(playbackAudioContext);
30
31 // Send audio to Flow when we receive it from the active input device
32 usePCMAudioListener((audio: Float32Array) => {
33 sendAudio(audio.buffer);
34 });
35
36 // Play back audio when we receive it from flow
37 useFlowEventListener(
38 "agentAudio",
39 useCallback(
40 ({ data }: AgentAudioEvent) => {
41 playAudio(data);
42 },
43 [playAudio]
44 )
45 );
46
47 const startSession = useCallback(
48 async ({
49 personaId,
50 deviceId,
51 }: {
52 personaId: string;
53 deviceId: string;
54 }) => {
55 const jwt = await getJWT("flow");
56
57 const isFirefox = navigator.userAgent.includes("Firefox");
58 const audioContext = new AudioContext({
59 sampleRate: isFirefox ? undefined : RECORDING_SAMPLE_RATE,
60 });
61 setAudioContext(audioContext);
62
63 const playbackAudioContext = isFirefox
64 ? new AudioContext({ sampleRate: 16_000 })
65 : audioContext;
66 setPlaybackAudioContext(playbackAudioContext);
67
68 await startConversation(jwt, {
69 config: {
70 template_id: personaId,
71 template_variables: {
72 // We can set up any template variables here
73 },
74 },
75 audioFormat: {
76 type: "raw",
77 encoding: "pcm_f32le",
78 sample_rate: audioContext.sampleRate,
79 },
80 });
81
82 await startRecording({
83 deviceId,
84 audioContext,
85 });
86 },
87 [startConversation, startRecording]
88 );
89
90 const closeAudioContext = useCallback(() => {
91 if (audioContext?.state !== "closed") {
92 audioContext?.close();
93 }
94 setAudioContext(undefined);
95 if (playbackAudioContext?.state !== "closed") {
96 playbackAudioContext?.close();
97 }
98 setPlaybackAudioContext(undefined);
99 }, [audioContext, playbackAudioContext]);
100
101 const stopSession = useCallback(async () => {
102 endConversation();
103 stopRecording();
104 closeAudioContext();
105 }, [endConversation, stopRecording, closeAudioContext]);
106
107 return { startSession, stopSession };
108}
109
The hook defined above is using a function to retrieve a JWT token. A JWT is necessary to talk to Flow API. This temporary token should be obtained from a server and not from client side code, because we need an API Key for retrieving JWT's and we don't want to expose API keys on client side code.
We can keep this functionality on the server side with Next.js. Create a file in the root directory of the project named actions.ts
with the following contents (note that we are adding 'use server').
More information about calling server actions from client components can be found here.
1"use server";
2
3import { createSpeechmaticsJWT } from "@speechmatics/auth";
4
5export async function getJWT(type: "flow" | "rt") {
6 const apiKey = process.env.API_KEY;
7 if (!apiKey) {
8 throw new Error("Please set the API_KEY environment variable");
9 }
10
11 return createSpeechmaticsJWT({ type, apiKey, ttl: 60 });
12}
13
As mentioned above this code needs an API Key that we can pass to our Next.js app through the API_KEY
env variable. Let's create a .env
file at the root of the project with the following content:
API_KEY='YOUR-API-KEY-GOES-HERE'
API Keys can be retrieved from Speechmatics user portal
Custom components
We'll also add some components that will be rendered within our existing skeleton app.
/components/MicrophoneSelect.tsx
1"use client";
2import { useAudioDevices } from "@speechmatics/browser-audio-input-react";
3
4export function MicrophoneSelect({ disabled }: { disabled?: boolean }) {
5 const devices = useAudioDevices();
6
7 switch (devices.permissionState) {
8 case "prompt":
9 return (
10 <Select
11 label="Enable mic permissions"
12 onClick={devices.promptPermissions}
13 onKeyDown={devices.promptPermissions}
14 />
15 );
16 case "prompting":
17 return <Select label="Enable mic permissions" aria-busy="true" />;
18 case "granted": {
19 return (
20 <Select label="Select audio device" name="deviceId" disabled={disabled}>
21 {devices.deviceList.map((d) => (
22 <option key={d.deviceId} value={d.deviceId}>
23 {d.label}
24 </option>
25 ))}
26 </Select>
27 );
28 }
29 case "denied":
30 return <Select label="Enable mic permissions" disabled />;
31 default:
32 devices satisfies never;
33 return null;
34 }
35}
36
37interface SelectProps extends React.SelectHTMLAttributes<HTMLSelectElement> {
38 label: string;
39 children?: React.ReactNode;
40}
41
42export const Select = ({
43 label,
44 children,
45 className,
46 ...props
47}: SelectProps) => (
48 <label className="form-control w-full max-w-xs">
49 <div className="label">
50 <span className="font-semibold">{label}</span>
51 </div>
52 <select className={`select select-bordered ${className || ""}`} {...props}>
53 {children}
54 </select>
55 </label>
56);
57
/components/Controls.tsx
1"use client";
2import { type FormEventHandler, useCallback, useMemo } from "react";
3import { MicrophoneSelect, Select } from "./MicrophoneSelect";
4import { useFlow } from "@speechmatics/flow-client-react";
5import { useFlowWithBrowserAudio } from "../hooks/useFlowWithBrowserAudio";
6
7export function Controls({
8 personas,
9}: {
10 personas: Record<string, { name: string }>;
11}) {
12 const { socketState, sessionId } = useFlow();
13 const { startSession, stopSession } = useFlowWithBrowserAudio();
14 const handleSubmit = useCallback<FormEventHandler<HTMLFormElement>>(
15 (e) => {
16 e.preventDefault();
17 const formData = new FormData(e.target as HTMLFormElement);
18 const personaId = formData.get("personaId")?.toString();
19 if (!personaId) throw new Error("No persona selected!");
20 const deviceId = formData.get("deviceId")?.toString();
21 if (!deviceId) throw new Error("No device selected!");
22
23 startSession({ personaId, deviceId });
24 },
25 [startSession]
26 );
27
28 const conversationButton = useMemo(() => {
29 if (socketState === "open" && sessionId) {
30 return (
31 <button
32 type="button"
33 className="flex-1 btn btn-primary text-md"
34 onClick={stopSession}
35 >
36 End conversation
37 </button>
38 );
39 }
40 if (
41 socketState === "connecting" ||
42 socketState === "closing" ||
43 (socketState === "open" && !sessionId)
44 ) {
45 return (
46 <button
47 type="button"
48 className="flex-1 btn btn-primary text-md"
49 disabled
50 >
51 <span className="loading loading-spinner" />
52 </button>
53 );
54 }
55 return (
56 <button type="submit" className="flex-1 btn btn-primary text-md">
57 Start conversation
58 </button>
59 );
60 }, [socketState, sessionId, stopSession]);
61
62 return (
63 <form onSubmit={handleSubmit}>
64 <MicrophoneSelect />
65 <Select label="Select a persona" name="personaId">
66 {Object.entries(personas).map(([id, persona]) => (
67 <option key={id} value={id} label={persona.name} />
68 ))}
69 </Select>
70 <div>{conversationButton}</div>
71 </form>
72 );
73}
74
The Controls
component is rendering a form that allows us to choose a persona and an input device for starting a conversation.
It's also using the hooks that we have created before for accessing input devices and playing pcm audio.
We just need to include the <Controls/>
component in the skeleton we had already created in the main page.
app/page.tsx
1import { FlowProvider, fetchPersonas } from "@speechmatics/flow-client-react";
2import { PCMAudioRecorderProvider } from "@speechmatics/browser-audio-input-react";
3import { Controls } from "../components/Controls";
4
5export default async function Home() {
6 const personas = await fetchPersonas();
7
8 // Filter out 'Welcome Voice' persona since it's not suitable for the example
9 const filteredPersonas = Object.fromEntries(
10 Object.entries(personas).filter(
11 ([_, agent]) => !agent.name.toLowerCase().includes("welcome voice")
12 )
13 );
14
15 return (
16 // Two context providers:
17 // 1. For the audio recorder (see https://github.com/speechmatics/speechmatics-js-sdk/blob/main/packages/browser-audio-input-react/README.md)
18 // 2. For the Flow API client (see https://github.com/speechmatics/speechmatics-js-sdk/blob/main/packages/flow-client-react/README.md)
19 <PCMAudioRecorderProvider workletScriptURL="/js/pcm-audio-worklet.min.js">
20 <FlowProvider
21 appId="nextjs-example"
22 audioBufferingMs={500}
23 websocketBinaryType="arraybuffer" // This is optional, but does lead to better audio performance, particularly on Firefox
24 >
25 <div className="container p-4 mx-auto max-xl:container">
26 <h1 className="text-2xl font-bold mb-4">
27 Speechmatics ❤️ NextJS Flow Example
28 </h1>
29 <Controls personas={filteredPersonas} />
30 </div>
31 </FlowProvider>
32 </PCMAudioRecorderProvider>
33 );
34}
35
Running the app
First of all let's build the app so that the copy-webpack-plugin
configuration that we created earlier can run and copy the pcm-audio-worklet.min.js
file in the public directory.
npm run build
Now we can run the app by starting it with:
npm run start
or by running it in dev mode:
npm run dev
Additional resources
This guide covers the minimum steps to get up and running using the Flow client library. A full example showcasing additional features of the Flow API like displaying the transcript of the conversation can be found here.
Dive deeper into the tools used in this guide: