Quickstart
Learn how to convert text to speech using our API.Transform text into natural sounding speech using our low latency text to speech API.
Text to speech is currently in preview and free to use. Features and pricing may change as we continue to improve the service. We welcome your feedback to help shape the final product.
Use cases
Our text to speech API is perfect for:
- Voice AI: Voice assistants, chatbots, and IVR systems
- Translation: Realtime translation of live events or media
- Accessibility applications: Screen readers and assistive technologies
- Content creation: Podcasts, dubbing, audiobooks, and voice-overs
- Media production: News broadcasts and automated announcements
Getting started
1. Get your API key
Create an API key in the portal to access the text to speech API. Store your key securely as a managed secret.
2. Install dependencies
pip install requests
3. Make your first request
Replace YOUR_API_KEY
with your actual API key from the portal:
curl -X POST "https://preview.tts.speechmatics.com/generate/sarah" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"text": "Welcome to the future of speech technology!"}' \
--output output.wav
import requests
url = "https://preview.tts.speechmatics.com/generate/sarah"
headers = {
"Authorization": "Bearer YOUR_API_KEY"
}
data = {
"text": "Welcome to the future of speech technology!"
}
SAMPLE_RATE = 16000
response = requests.post(url, headers=headers, json=data)
if response.status_code == 200:
wav_data = response.content
with open("output.wav", "wb") as wav_file:
wav_file.write(wav_data)
# Calculate actual audio duration
audio_bytes = len(wav_data) - 44 # Remove WAV header
samples = audio_bytes // 2 # 16-bit = 2 bytes per sample
duration = samples / SAMPLE_RATE
print(f"Speech synthesis completed! Saved {len(wav_data)} bytes as output.wav")
print(f"Audio duration: {duration:.2f} seconds ({samples:,} samples)")
else:
print(f"Error: {response.status_code}")
print(f"Response: {response.text}")
Voices
We offer three voices you can choose from:
To use a specific voice, include the voice ID in your API endpoint:
https://preview.tts.speechmatics.com/generate/<voice_id>
We'll soon be adding more voices to our API. We'd love your feedback on what kind of voices you'd like us to add!
Output formats
The text to speech API supports two output formats that you can specify using the output_format
query parameter:
wav_16000
(default)
- Format: Complete WAV file with headers
- Sample Rate: 16 kHz
- Bit Depth: 16-bit signed integer
- Channels: Mono (single channel)
- Usage:
?output_format=wav_16000
(or omit parameter for default)
pcm_16000
- Format: Raw PCM (Pulse Code Modulation) data
- Sample Rate: 16 kHz
- Bit Depth: 16-bit signed integer
- Channels: Mono (single channel)
- Byte Order: Little-endian
- Usage:
?output_format=pcm_16000
You can start consuming the generated audio as soon as it's ready. There's no need to wait for the request to complete!
Best practices
- Streaming audio: You can start playing the audio before the response completes
- Use proper punctuation: Periods, commas, and question marks improve natural speech flow
- Spell out abbreviations: Write "Doctor" instead of "Dr." for clearer pronunciation
- Spell out numbers: Write "Twenty-five dollars" instead of "$25"
- Batch requests: Combine multiple short texts into longer requests when possible
- Cache audio: Store generated audio files to avoid repeated API calls for identical text
Deployment
Our text to speech API can be consumed via our managed service or deployed in your own environment.
To find out more about deploying our API in your environment, speak to sales.
FAQ
Core functionality
What languages do you support?
We support English. We plan to launch additional languages in 2025. Need a specific language? Raise it as a request on our GitHub.
What voices are available?
We are rapidly expanding the available list of voices, providing speech synthesis in multiple languages, accents and dialects. More voice options will be added during the preview period.
Can I control voice speed, pitch, or emphasis?
Not yet. The API outputs natural speech with prosody driven by the text content. Fine-grained voice control features may be added in future releases.
Technical
How much latency should I expect?
The initial audio chunk typically returns in less than 200ms, with subsequent audio chunks returning faster than real time.
Is there a streaming API for real-time generation?
The API supports streaming audio output (you can play audio as it arrives), but not full bidirectional streaming. We plan to add support for this in the future.
What concurrency is supported and are there rate limits?
We expect to support high concurrency when released. If you encounter rate limit errors, use retry with exponential backoff.
How large can a text input be?
Our service handles text splitting server-side and can process long paragraphs of text. However, for the lowest latency, sending smaller amounts of text will provide the best performance.
Do you support webhooks or callbacks for long-running requests?
Webhook and callback support is not available. This feature may be added in future releases for handling long-running synthesis requests.
Are there official SDKs available?
Official SDKs for different programming languages are not available yet. The API can be accessed using standard HTTP requests as shown in the quickstart example. We will release official SDKs later.
Commercial
What is the price?
We will announce pricing in September 2025. The service is currently free during the preview period. Billing will start on the 1st of October. We will announce pricing at least 2 weeks ahead of billing for usage.
There will be a generous free tier.
Can I use this for commercial applications?
Text to speech is currently in preview and free to use. Features and pricing may change as the service evolves. For commercial use and enterprise pricing, speak to sales to discuss your specific needs.
What happens to my data after synthesis?
During the preview, we will be storing input text and generated audio to help improve our service. In production, Speechmatics will offer both non-retentive and data-retentive options. Customers who allow retention may be eligible for discounted pricing.
Retained data helps improve system quality over time.
Implementation and deployment
Can I deploy this in my own environment?
Yes! The text to speech API can be consumed via our managed service or deployed in your own environment. To learn more about on-premises deployment options, speak to sales.
Support
Where can I provide feedback or get help?
You can submit feedback, bug reports, or feature requests through the Speechmatics GitHub discussions.
Next steps
Ready to build something amazing with text to speech? We'd love to hear about your project and help you succeed.
Get in touch with us:
- Share your feedback and feature requests
- Ask questions about implementation
- Discuss enterprise pricing and custom voices
- Report any issues or bugs you encounter
Contact our team or join our developer community (https://www.reddit.com/r/Speechmatics) to connect with other builders using text to speech.