Flow – Voice AI

Flow API Reference

GETwss://flow.api.speechmatics.com/

Protocol overview

A basic Flow session will have the following message exchanges:

Indicates messages sent by the client

Indicates messages sent by the service

Session Start

Once-only at conversation start:

StartConversation
ConversationStarted

Audio Input/Output and Transcripts

Repeating during the conversation to cover the audio stream from the client and corresponding transcripts:

AddAudio (client sending audio)
AudioAdded (server received audio)
AddTranscript / AddPartialTranscript (server sent transcript)
AddAudio (server sending audio)
AudioReceived (client received audio)

TTS Response Management

ResponseStarted (when TTS begins)
ResponseCompleted (when TTS finishes normally)
ResponseInterrupted (when TTS is interrupted)

Function Calling

Exchanged during function calling over the websocket:

ToolInvoke (when function call is triggered)
ToolResult (client response to function call)

Session Termination

Once-only at conversation end:

AudioEnded (client ending session)
ConversationEnding (agent ending session)
ConversationEnded (final message before connection close)

Info, Warning and Error messages will be sent as appropriate.

Sent messages

StartConversation

Initiates a new conversation session.

messagerequired

Constant value: StartConversation

audio_format object

typerequired

Constant value: raw

encodingstring

Possible values: [pcm_f32le, pcm_s16le, mulaw]

Default value: pcm_s16le

sample_rateinteger

Default value: 16000

conversation_config objectrequired

template_idstringrequired

Required in the the StartConversation message in the Flow API. Generated from the Speechmatics Portal. This maps to the language supported, agent's prompt, LLM, TTS voice, & custom dictionary. These can be customised by creating or modifying agents in the Portal.

template_variables object

[property name: string]string

tools object[]

A list of tools that the LLM can use during the conversation.

Array [

typestringrequired

The type of tool to use. At the moment, only function is supported.

Possible values: [function]

function objectrequired

The function that the tool will call.

namestringrequired

The name of the function that should be called. This name is passed as a field in the ToolInvoke message

descriptionstring

A natural language string that instructs the LLM about the condition in which the function must be called

parameters object

An object containing the properties of the function call which should be collected from the conversation. Each parameter is defined by:

typestring

Possible values: [object]

requiredstring[]

(optional) The list of input parameters for the function which are required.

properties object

Properties of the function parameter object

[property name: string] object

typestringrequired

Possible values: [integer, number, string, boolean]

descriptionstring

A description of the parameter.

enumundefined[]

examplestring

An example value for the parameter.

]

debug object

llmboolean

[property name: string]any

AddAudio

A binary chunk of audio. The server confirms receipt by sending an AudioAdded message.

stringbinary

AudioReceived

Client response to AddAudio, indicating that server audio has been added to the client successfully.

messagerequired

Constant value: AudioReceived

seq_nointegerrequired

AudioEnded

Declares that the client has no more audio to send.

messagerequired

Constant value: AudioEnded

last_seq_nointegerrequired

AddInput

Message used by the application client to send input to the LLM in order to influence the conversation.

messagerequired

Constant value: AddInput

inputstringrequired

The information that the LLM must incorporate in the response

interrupt_responseboolean

If true, the response will be interrupted by the new input.
If false, the response will continue until it is complete, defaults to false.

Default value: false

immediateboolean

If true, the input will be treated as urgent and will be sent to LLM immediately.
If false, new input will be added to current prompt and sent to LLM as a part of the next request.

Default value: false

ToolResult

Contains the result of a tool invocation.

messagerequired

Constant value: ToolResult

idstringrequired

The id of the tool invoke.

statusstringrequired

Possible values: [ok, rejected, failed]

contentstring

The content of the tool result.

Received messages

ConversationStarted

Server response to StartConversation, acknowledging that a conversation session has started.

messagerequired

Constant value: ConversationStarted

idstring

asr_session_idstring

language_pack_info object

Properties of the language pack.

language_descriptionstring

Full descriptive name of the language, e.g. 'Japanese'.

word_delimiterstringrequired

The character to use to separate words.

writing_directionstring

The direction that words in the language should be written and read in.

Possible values: [left-to-right, right-to-left]

itnboolean

Whether or not ITN (inverse text normalization) is available for the language pack.

adaptedboolean

Whether or not language model adaptation has been applied to the language pack.

AddAudio

A binary chunk of audio. The server confirms receipt by sending an AudioAdded message.

stringbinary

AudioAdded

Server response to AddAudio, indicating that audio has been added successfully.

messagerequired

Constant value: AudioAdded

seq_nointegerrequired

AddPartialTranscript

Contains a work-in-progress transcript of a part of the audio that the client has sent.

messagerequired

Constant value: AddPartialTranscript

formatstring

Speechmatics JSON output format version number.

Example: 2.1

metadata objectrequired

start_timefloatrequired

end_timefloatrequired

transcriptstringrequired

results object[]required

Array [

typestringrequired

Possible values: [word, punctuation]

start_timefloatrequired

end_timefloatrequired

channelstring

attaches_tostring

Possible values: [next, previous, none, both]

is_eosboolean

alternatives object[]

Array [

contentstringrequired

confidencefloatrequired

languagestring

display object

directionstringrequired

Possible values: [ltr, rtl]

speakerstring

]

scorefloat

Possible values: >= 0 and <= 1

volumefloat

Possible values: >= 0 and <= 100

]

AddTranscript

Contains the final transcript of a part of the audio that the client has sent.

messagerequired

Constant value: AddTranscript

formatstring

Speechmatics JSON output format version number.

Example: 2.1

metadata objectrequired

start_timefloatrequired

end_timefloatrequired

transcriptstringrequired

results object[]required

Array [

typestringrequired

Possible values: [word, punctuation]

start_timefloatrequired

end_timefloatrequired

channelstring

attaches_tostring

Possible values: [next, previous, none, both]

is_eosboolean

alternatives object[]

Array [

contentstringrequired

confidencefloatrequired

languagestring

display object

directionstringrequired

Possible values: [ltr, rtl]

speakerstring

]

scorefloat

Possible values: >= 0 and <= 1

volumefloat

Possible values: >= 0 and <= 100

]

ResponseStarted

Indicates the start of a response from the agent.

messagerequired

Constant value: ResponseStarted

contentstringrequired

The content that is spoken by the agent in the response.

start_timefloatrequired

The start time of the spoken response, relative to the start of the session.

ResponseCompleted

Indicates the completion of a response from the agent.

messagerequired

Constant value: ResponseCompleted

contentstringrequired

The content that is spoken by the agent in the response.

start_timefloatrequired

The start time of the spoken response, relative to the start of the session.

end_timefloatrequired

The end time of the spoken response, relative to the start of the session.

ResponseInterrupted

Indicates that a response from the agent was interrupted.

messagerequired

Constant value: ResponseInterrupted

contentstringrequired

The content that is spoken by the agent in the response.

start_timefloatrequired

The start time of the spoken response, relative to the start of the session.

end_timefloatrequired

The end time of the spoken response, relative to the start of the session.

ToolInvoke

Invokes a tool with the specified parameters.

messagerequired

Constant value: ToolInvoke

idstringrequired

The id of the tool invoke.

typerequired

Constant value: function

function objectrequired

namestringrequired

The name of the tool to invoke.

arguments objectrequired

[property name: string] object

oneOf

MOD1
MOD2
MOD3

string

Error

Error messages sent from the server to the client.

messagerequired

Constant value: Error

typestringrequired

Possible values: [asr_error, protocol_error, config_error, idle_timeout, session_timeout, not_allowed, not_authorised, quota_exceeded, timelimit_exceeded, job_error, internal_error, unknown_error]

reasonstringrequired

Warning

Warning messages sent from the server to the client.

oneOf

DefaultWarning
ConversationTermination

messagerequired

Constant value: Warning

reasonstringrequired

typestringrequired

Possible values: [high_asr_latency, llm_error, high_llm_latency, llm_request_content_filter, tts_error, high_tts_latency, protocol_error, idle_timeout, session_timeout]

messagerequired

Constant value: Warning

reasonstringrequired

typerequired

Constant value: conversation_termination

conversation_terminationintegerrequired

Info

Additional information sent from the server to the client.

oneOf

StatusUpdate
ConversationDurationLimit
ConcurrentSessionUsage

messagerequired

Constant value: Info

reasonstringrequired

typerequired

Constant value: status_update

event objectrequired

prev_statusstringrequired

statusstringrequired

messagerequired

Constant value: Info

reasonstringrequired

typerequired

Constant value: conversation_duration_limit

conversation_duration_limitintegerrequired

messagerequired

Constant value: Info

reasonstringrequired

typerequired

Constant value: concurrent_session_usage

usageintegerrequired

quotaintegerrequired

last_udpateddate-time

ConversationEnding

Indicates starting of the session transfer procedure

messagerequired

Constant value: ConversationEnding

ConversationEnded

Server ends the conversation, after the server has finished sending all other messages.

messagerequired

Constant value: ConversationEnded

Flow API Reference

wss://flow.api.speechmatics.com/

Protocol overview​

Session Start​

Audio Input/Output and Transcripts​

TTS Response Management​

Function Calling​

Session Termination​

Sent messages​

StartConversation​

AddAudio​

AudioReceived​

AudioEnded​

AddInput​

ToolResult​

Received messages​

ConversationStarted​

AddAudio​

AudioAdded​

AddPartialTranscript​

AddTranscript​

ResponseStarted​

ResponseCompleted​

ResponseInterrupted​

ToolInvoke​

Error​

Warning​

Info​

ConversationEnding​

ConversationEnded​

Protocol overview

Session Start

Audio Input/Output and Transcripts

TTS Response Management

Function Calling

Session Termination

Sent messages

StartConversation

AddAudio

AudioReceived

AudioEnded

AddInput

ToolResult

Received messages

ConversationStarted

AddAudio

AudioAdded

AddPartialTranscript

AddTranscript

ResponseStarted

ResponseCompleted

ResponseInterrupted

ToolInvoke

Error

Warning

Info

ConversationEnding

ConversationEnded