Voice Agents — Flow

Supported Formats and Limits

Learn about the supported input and output audio formats for the Flow API

Input audio

All input audio (i.e. the user's voice) sent to the Flow API must be raw PCM audio in one of the following formats:

PCM F32 LE raw audio stream (32-bit float)
PCM S16 LE raw audio stream (16-bit signed int)

Other audio encodings are not supported. Sample rates are not restricted, but we recommend using 16kHz.

Output audio

The Flow API will always return PCM audio in PCM S16 LE format, regardless of the input format.

The output audio sample rate is always 16kHz.

Usage Limits

The Flow API limits the number of hours of audio users can process each month to help manage load on our servers. The current limits (in hours) by account type are listed in the table below:

Tier	Max. hours per month	Concurrent sessions
Free Tier	50	1
Paid Tier	1000	1
Enterprise	Custom	Custom

Please reach out to Support if you need to increase the above limits.

Guidance for users

Clients can disconnect a session before it is automatically terminated and immediately reconnect a new session. Note that new sessions will typically start in less than a second. If seamless transition is required, the new session can be connected a few seconds before disconnecting the old session.

Since unpredictable network issues can cause WebSocket connections to be dropped, we recommend graceful handling of session termination for long-running sessions.

Data Retention

Conversation audio and transcriptions by the Flow API are not stored.

Input audio​

Output audio​

Usage Limits​

Guidance for users​

Data Retention​

Input audio

Output audio

Usage Limits

Guidance for users

Data Retention