Flow API Reference
GETwss://flow.api.speechmatics.com/
Protocol overview
A basic Flow session will have the following message exchanges:
Indicates messages sent by the client
Indicates messages sent by the service
Session Start
Once-only at conversation start:
- StartConversation
- ConversationStarted
Audio Input/Output and Transcripts
Repeating during the conversation to cover the audio stream from the client and corresponding transcripts:
-
AddAudio
(client sending audio) -
AudioAdded
(server received audio) -
AddTranscript
/AddPartialTranscript
(server sent transcript) -
AddAudio
(server sending audio) -
AudioReceived
(client received audio)
TTS Response Management
-
ResponseStarted
(when TTS begins) -
ResponseCompleted
(when TTS finishes normally) -
ResponseInterrupted
(when TTS is interrupted)
Function Calling
Exchanged during function calling over the websocket:
-
ToolInvoke
(when function call is triggered) -
ToolResult
(client response to function call)
Session Termination
Once-only at conversation end:
-
AudioEnded
(client ending session) -
ConversationEnding
(agent ending session) -
ConversationEnded
(final message before connection close)
Info
, Warning
and Error
messages will be sent as appropriate.
Sent messages
StartConversation
StartConversation
audio_format objectrequired
- AudioFormatRaw
- AudioFormatFile
raw
Possible values: [pcm_f32le
, pcm_s16le
, mulaw
]
file
conversation_config objectrequired
Required in the the StartConversation
message in the Flow API. Generated from the Speechmatics Portal. This maps to the language supported, agent's prompt, LLM, TTS voice, & custom dictionary. These can be customised by creating or modifying agents in the Portal.
template_variables object
tools object[]
A list of tools that the LLM can use during the conversation.
The type of tool to use. At the moment, only function
is supported.
Possible values: [function
]
function objectrequired
The function that the tool will call.
The name of the function that should be called. This name is passed as a field in the ToolInvoke message
A natural language string that instructs the LLM about the condition in which the function must be called
parameters object
An object containing the properties
of the function call which should be collected from the conversation. Each parameter is defined by:
Possible values: [object
]
(optional) The list of input parameters for the function which are required.
properties object
Properties of the function parameter object
[property name: string] object
Possible values: [integer
, number
, string
, boolean
]
A description of the parameter.
An example value for the parameter.
debug object
AddAudio
AudioReceived
AudioReceived
AudioEnded
AudioEnded
AddInput
AddInput
The information that the LLM must incorporate in the response
If true, the response will be interrupted by the new input.
If false, the response will continue until it is complete, defaults to false.
false
If true, the input will be treated as urgent and will be sent to LLM immediately.
If false, new input will be added to current prompt and sent to LLM as a part of the next request.
false
ToolResult
ToolResult
The id of the tool invoke.
Possible values: [ok
, rejected
, failed
]
The content of the tool result.
Received messages
ConversationStarted
ConversationStarted
AddAudio
AudioAdded
AudioAdded
AddPartialTranscript
AddPartialTranscript
Speechmatics JSON output format version number.
2.1
metadata objectrequired
results object[]required
Possible values: [word
, punctuation
]
Possible values: [next
, previous
, none
, both
]
alternatives object[]
display object
Possible values: [ltr
, rtl
]
Possible values: >= 0
and <= 1
Possible values: >= 0
and <= 100
AddTranscript
AddTranscript
Speechmatics JSON output format version number.
2.1
metadata objectrequired
results object[]required
Possible values: [word
, punctuation
]
Possible values: [next
, previous
, none
, both
]
alternatives object[]
display object
Possible values: [ltr
, rtl
]
Possible values: >= 0
and <= 1
Possible values: >= 0
and <= 100
ResponseStarted
ResponseStarted
The content that is spoken by the agent in the response.
The start time of the spoken response, relative to the start of the session.
ResponseCompleted
ResponseCompleted
The content that is spoken by the agent in the response.
The start time of the spoken response, relative to the start of the session.
The end time of the spoken response, relative to the start of the session.
ResponseInterrupted
ResponseInterrupted
The content that is spoken by the agent in the response.
The start time of the spoken response, relative to the start of the session.
The end time of the spoken response, relative to the start of the session.
ToolInvoke
ToolInvoke
The id of the tool invoke.
function objectrequired
The name of the tool to invoke.
arguments objectrequired
[property name: string] object
- MOD1
- MOD2
- MOD3
Error
Error
Possible values: [invalid_message
, invalid_model
, invalid_config
, invalid_audio_type
, not_authorised
, insufficient_funds
, not_allowed
, job_error
, data_error
, buffer_error
, protocol_error
, timelimit_exceeded
, quota_exceeded
, unknown_error
]
Warning
Warning
Possible values: [duration_limit_exceeded
]
Info
Info
Possible values: [recognition_quality
, model_redirect
, deprecated
]
ConversationEnding
ConversationEnding
ConversationEnded
ConversationEnded