Skip to main content
Flow — Voice AI

Flow API Reference

GET

wss://flow.api.speechmatics.com/

Protocol overview

A basic Flow session will have the following message exchanges:

Indicates messages sent by the client

Indicates messages sent by the service

Session Start

Once-only at conversation start:

  • StartConversation
  • ConversationStarted

Audio Input/Output and Transcripts

Repeating during the conversation to cover the audio stream from the client and corresponding transcripts:

  • AddAudio (client sending audio)
  • AudioAdded (server received audio)
  • AddTranscript / AddPartialTranscript (server sent transcript)
  • AddAudio (server sending audio)
  • AudioReceived (client received audio)

TTS Response Management

  • ResponseStarted (when TTS begins)
  • ResponseCompleted (when TTS finishes normally)
  • ResponseInterrupted (when TTS is interrupted)

Function Calling

Exchanged during function calling over the websocket:

  • ToolInvoke (when function call is triggered)
  • ToolResult (client response to function call)

Session Termination

Once-only at conversation end:

  • AudioEnded (client ending session)
  • ConversationEnding (agent ending session)
  • ConversationEnded (final message before connection close)

Info, Warning and Error messages will be sent as appropriate.

Sent messages

StartConversation

Initiates a new conversation session.
messagerequired
Constant value: StartConversation
audio_format objectrequired
oneOf
typerequired
Constant value: raw
encodingstringrequired

Possible values: [pcm_f32le, pcm_s16le, mulaw]

sample_rateintegerrequired
conversation_config objectrequired
template_idstringrequired

Required in the the StartConversation message in the Flow API. Generated from the Speechmatics Portal. This maps to the language supported, agent's prompt, LLM, TTS voice, & custom dictionary. These can be customised by creating or modifying agents in the Portal.

template_variables object
[property name: string]string
tools object[]

A list of tools that the LLM can use during the conversation.

  • Array [
  • typestringrequired

    The type of tool to use. At the moment, only function is supported.

    Possible values: [function]

    function objectrequired

    The function that the tool will call.

    namestringrequired

    The name of the function that should be called. This name is passed as a field in the ToolInvoke message

    descriptionstring

    A natural language string that instructs the LLM about the condition in which the function must be called

    parameters object

    An object containing the properties of the function call which should be collected from the conversation. Each parameter is defined by:

    typestring

    Possible values: [object]

    requiredstring[]

    (optional) The list of input parameters for the function which are required.

    properties object

    Properties of the function parameter object

    [property name: string] object
    typestringrequired

    Possible values: [integer, number, string, boolean]

    descriptionstring

    A description of the parameter.

    enumundefined[]
    examplestring

    An example value for the parameter.

  • ]
  • debug object
    llmboolean
    [property name: string]any

    AddAudio

    A binary chunk of audio. The server confirms receipt by sending an AudioAdded message.
    stringbinary

    AudioReceived

    Client response to AddAudio, indicating that server audio has been added to the client successfully.
    messagerequired
    Constant value: AudioReceived
    seq_nointegerrequired

    AudioEnded

    Declares that the client has no more audio to send.
    messagerequired
    Constant value: AudioEnded
    last_seq_nointegerrequired

    AddInput

    Message used by the application client to send input to the LLM in order to influence the conversation.
    messagerequired
    Constant value: AddInput
    inputstringrequired

    The information that the LLM must incorporate in the response

    interrupt_responseboolean

    If true, the response will be interrupted by the new input.
    If false, the response will continue until it is complete, defaults to false.

    Default value: false
    immediateboolean

    If true, the input will be treated as urgent and will be sent to LLM immediately.
    If false, new input will be added to current prompt and sent to LLM as a part of the next request.

    Default value: false

    ToolResult

    Contains the result of a tool invocation.
    messagerequired
    Constant value: ToolResult
    idstringrequired

    The id of the tool invoke.

    statusstringrequired

    Possible values: [ok, rejected, failed]

    contentstring

    The content of the tool result.

    Received messages

    ConversationStarted

    Server response to StartConversation, acknowledging that a conversation session has started.
    messagerequired
    Constant value: ConversationStarted
    orchestrator_versionstring
    idstring

    AddAudio

    A binary chunk of audio. The server confirms receipt by sending an AudioAdded message.
    stringbinary

    AudioAdded

    Server response to AddAudio, indicating that audio has been added successfully.
    messagerequired
    Constant value: AudioAdded
    seq_nointegerrequired

    AddPartialTranscript

    Contains a work-in-progress transcript of a part of the audio that the client has sent.
    messagerequired
    Constant value: AddPartialTranscript
    formatstring

    Speechmatics JSON output format version number.

    Example: 2.1
    metadata objectrequired
    start_timefloatrequired
    end_timefloatrequired
    transcriptstringrequired
    results object[]required
  • Array [
  • typestringrequired

    Possible values: [word, punctuation]

    start_timefloatrequired
    end_timefloatrequired
    channelstring
    attaches_tostring

    Possible values: [next, previous, none, both]

    is_eosboolean
    alternatives object[]
  • Array [
  • contentstringrequired
    confidencefloatrequired
    languagestring
    display object
    directionstringrequired

    Possible values: [ltr, rtl]

    speakerstring
  • ]
  • scorefloat

    Possible values: >= 0 and <= 1

    volumefloat

    Possible values: >= 0 and <= 100

  • ]
  • AddTranscript

    Contains the final transcript of a part of the audio that the client has sent.
    messagerequired
    Constant value: AddTranscript
    formatstring

    Speechmatics JSON output format version number.

    Example: 2.1
    metadata objectrequired
    start_timefloatrequired
    end_timefloatrequired
    transcriptstringrequired
    results object[]required
  • Array [
  • typestringrequired

    Possible values: [word, punctuation]

    start_timefloatrequired
    end_timefloatrequired
    channelstring
    attaches_tostring

    Possible values: [next, previous, none, both]

    is_eosboolean
    alternatives object[]
  • Array [
  • contentstringrequired
    confidencefloatrequired
    languagestring
    display object
    directionstringrequired

    Possible values: [ltr, rtl]

    speakerstring
  • ]
  • scorefloat

    Possible values: >= 0 and <= 1

    volumefloat

    Possible values: >= 0 and <= 100

  • ]
  • ResponseStarted

    Indicates the start of a response from the agent.
    messagerequired
    Constant value: ResponseStarted
    contentstringrequired

    The content that is spoken by the agent in the response.

    start_timefloatrequired

    The start time of the spoken response, relative to the start of the session.

    ResponseCompleted

    Indicates the completion of a response from the agent.
    messagerequired
    Constant value: ResponseCompleted
    contentstringrequired

    The content that is spoken by the agent in the response.

    start_timefloatrequired

    The start time of the spoken response, relative to the start of the session.

    end_timefloatrequired

    The end time of the spoken response, relative to the start of the session.

    ResponseInterrupted

    Indicates that a response from the agent was interrupted.
    messagerequired
    Constant value: ResponseInterrupted
    contentstringrequired

    The content that is spoken by the agent in the response.

    start_timefloatrequired

    The start time of the spoken response, relative to the start of the session.

    end_timefloatrequired

    The end time of the spoken response, relative to the start of the session.

    ToolInvoke

    Invokes a tool with the specified parameters.
    messagerequired
    Constant value: ToolInvoke
    idstringrequired

    The id of the tool invoke.

    function objectrequired
    namestringrequired

    The name of the tool to invoke.

    arguments objectrequired
    [property name: string] object
    oneOf
    string

    Error

    Error messages sent from the server to the client.
    messagerequired
    Constant value: Error
    typestringrequired

    Possible values: [invalid_message, invalid_model, invalid_config, invalid_audio_type, not_authorised, insufficient_funds, not_allowed, job_error, data_error, buffer_error, protocol_error, timelimit_exceeded, quota_exceeded, unknown_error]

    reasonstringrequired
    codeinteger
    seq_nointeger

    Warning

    Warning messages sent from the server to the client.
    messagerequired
    Constant value: Warning
    typestringrequired

    Possible values: [duration_limit_exceeded]

    reasonstringrequired
    codeinteger
    seq_nointeger
    duration_limitnumber

    Info

    Additional information sent from the server to the client.
    messagerequired
    Constant value: Info
    typestringrequired

    Possible values: [recognition_quality, model_redirect, deprecated]

    reasonstringrequired
    codeinteger
    seq_nointeger
    qualitystring

    ConversationEnding

    Indicates starting of the session transfer procedure
    messagerequired
    Constant value: ConversationEnding
    reasonstring

    ConversationEnded

    Server ends the conversation, after the server has finished sending all other messages.
    messagerequired
    Constant value: ConversationEnded