Text to speech

Quickstart

Learn how to convert text to speech using our API.

Transform text into natural sounding speech using our low latency text to speech API.

Text to speech is currently in preview and free to use. Features and pricing may change as we continue to improve the service. We welcome your feedback to help shape the final product.

Use cases

Our text to speech API is perfect for:

Voice AI: Voice assistants, chatbots, and IVR systems
Translation: Realtime translation of live events or media
Accessibility applications: Screen readers and assistive technologies
Content creation: Podcasts, dubbing, audiobooks, and voice-overs
Media production: News broadcasts and automated announcements

Getting started

1. Get your API key

Create an API key in the portal to access the text to speech API. Store your key securely as a managed secret.

2. Install dependencies

pip install requests

3. Make your first request

Replace YOUR_API_KEY with your actual API key from the portal:

curl -X POST "https://preview.tts.speechmatics.com/generate/sarah" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text": "Welcome to the future of speech technology!"}' \
  --output output.wav

import requests

url = "https://preview.tts.speechmatics.com/generate/sarah"
headers = {
    "Authorization": "Bearer YOUR_API_KEY"
}
data = {
    "text": "Welcome to the future of speech technology!"
}

SAMPLE_RATE = 16000

response = requests.post(url, headers=headers, json=data)

if response.status_code == 200:
    wav_data = response.content
    
    with open("output.wav", "wb") as wav_file:  
        wav_file.write(wav_data)
    
    # Calculate actual audio duration
    audio_bytes = len(wav_data) - 44  # Remove WAV header
    samples = audio_bytes // 2  # 16-bit = 2 bytes per sample
    duration = samples / SAMPLE_RATE
    
    print(f"Speech synthesis completed! Saved {len(wav_data)} bytes as output.wav")
    print(f"Audio duration: {duration:.2f} seconds ({samples:,} samples)")
else:
    print(f"Error: {response.status_code}")
    print(f"Response: {response.text}")

# pip install speechmatics-tts

import asyncio

from speechmatics.tts import AsyncClient, Voice, OutputFormat

# Generate speech data from text and save to WAV file
async def main():
    async with AsyncClient() as client:
        async with await client.generate(
            text="Welcome to the future of audio generation from text!",
            voice=Voice.SARAH,
            output_format=OutputFormat.WAV_16000
        ) as response:
            audio = b''.join([chunk async for chunk in response.content.iter_chunked(1024)])
            with open("output.wav", "wb") as wav:
                wav.write(audio)


# Run the async main function
if __name__ == "__main__":
    asyncio.run(main())

Voices

We currently offer four voices you can choose from:

Voice ID	Description
`sarah`	English Female (UK)
`theo`	English Male (UK)
`megan`	English Female (US)
`jack`	English Male (US)

To use a specific voice, include the voice ID in your API endpoint:

https://preview.tts.speechmatics.com/generate/<voice_id>

We'll soon be adding more voices to our API. We'd love your feedback on what kind of voices you'd like us to add!

Output formats

The text to speech API supports two output formats that you can specify using the output_format query parameter:

`wav_16000` (default)

Format: Complete WAV file with headers
Sample Rate: 16 kHz
Bit Depth: 16-bit signed integer
Channels: Mono (single channel)
Usage: ?output_format=wav_16000 (or omit parameter for default)

`pcm_16000`

Format: Raw PCM (Pulse Code Modulation) data
Sample Rate: 16 kHz
Bit Depth: 16-bit signed integer
Channels: Mono (single channel)
Byte Order: Little-endian
Usage: ?output_format=pcm_16000

You can start consuming the generated audio as soon as it's ready. There's no need to wait for the request to complete!

Best practices

Streaming audio: You can start playing the audio before the response completes
Use proper punctuation: Periods, commas, and question marks improve natural speech flow
Spell out abbreviations: Write "Doctor" instead of "Dr." for clearer pronunciation
Spell out numbers: Write "Twenty-five dollars" instead of "$25"
Batch requests: Combine multiple short texts into longer requests when possible
Cache audio: Store generated audio files to avoid repeated API calls for identical text

Deployment

Our text to speech API can be consumed via our managed service or deployed in your own environment.

To find out more about deploying our API in your environment, speak to sales.

FAQ

Core functionality

What languages do you support?

We support English. We plan to launch additional languages in 2025. Need a specific language? Raise it as a request on our GitHub.

What voices are available?

We are rapidly expanding the available list of voices, providing speech synthesis in multiple languages, accents and dialects. More voice options will be added during the preview period.

Can I control voice speed, pitch, or emphasis?

Not yet. The API outputs natural speech with prosody driven by the text content. Fine-grained voice control features may be added in future releases.

Technical

How much latency should I expect?

The initial audio chunk typically returns in less than 200ms, with subsequent audio chunks returning faster than real time.

Is there a streaming API for real-time generation?

The API supports streaming audio output (you can play audio as it arrives), but not full bidirectional streaming. We plan to add support for this in the future.

What concurrency is supported and are there rate limits?

We expect to support high concurrency when released. If you encounter rate limit errors, use retry with exponential backoff.

How large can a text input be?

Our service handles text splitting server-side and can process long paragraphs of text. However, for the lowest latency, sending smaller amounts of text will provide the best performance.

Do you support webhooks or callbacks for long-running requests?

Webhook and callback support is not available. This feature may be added in future releases for handling long-running synthesis requests.

Are there official SDKs available?

Official SDKs for different programming languages are not available yet. The API can be accessed using standard HTTP requests as shown in the quickstart example. We will release official SDKs later.

Commercial

What is the price?

We will announce pricing in September 2025. The service is currently free during the preview period. Billing will start on the 1st of October. We will announce pricing at least 2 weeks ahead of billing for usage.

There will be a generous free tier.

Can I use this for commercial applications?

Text to speech is currently in preview and free to use. Features and pricing may change as the service evolves. For commercial use and enterprise pricing, speak to sales to discuss your specific needs.

What happens to my data after synthesis?

During the preview, we will be storing input text and generated audio to help improve our service. In production, Speechmatics will offer both non-retentive and data-retentive options. Customers who allow retention may be eligible for discounted pricing.

Retained data helps improve system quality over time.

Implementation and deployment

Can I deploy this in my own environment?

Yes! The text to speech API can be consumed via our managed service or deployed in your own environment. To learn more about on-premises deployment options, speak to sales.

Support

Where can I provide feedback or get help?

You can submit feedback, bug reports, or feature requests through the Speechmatics GitHub discussions.

Next steps

Ready to build something amazing with text to speech? We'd love to hear about your project and help you succeed.

Get in touch with us:

Share your feedback and feature requests
Ask questions about implementation
Discuss enterprise pricing and custom voices
Report any issues or bugs you encounter

Contact our team or join our developer community (https://www.reddit.com/r/Speechmatics) to connect with other builders using text to speech.

Use cases​

Getting started​

1. Get your API key​

2. Install dependencies​

3. Make your first request​

Voices​

Output formats​

wav_16000 (default)​

pcm_16000​

Best practices​

Deployment​

FAQ​

Core functionality​

Technical​

Commercial​

Implementation and deployment​

Support​

Next steps​