Quickstart
Convert text to natural-sounding speech with our text to speech API.Transform text into natural sounding speech using our low latency text to speech API.
Text to speech is currently in preview and free to use. Features and pricing may change as we continue to improve the service. We welcome your feedback to help shape the final product.
Use cases
Our text to speech API is perfect for:
- Voice AI: Voice assistants, chatbots, and IVR systems
- Translation: Realtime translation of live events or media
- Accessibility applications: Screen readers and assistive technologies
- Content creation: Podcasts, dubbing, audiobooks, and voice-overs
- Media production: News broadcasts and automated announcements
Getting started
1. Get your API key
Create an API key in the portal to access the Text-to-Speech API. Store your key securely as a managed secret.
2. Install dependencies
pip install requests numpy
3. Make your first request
Replace YOUR_API_KEY
with your actual API key from the portal:
import requests
import wave
import numpy as np
url = "https://preview.tts.speechmatics.com/generate"
headers = {
"Authorization": "Bearer YOUR_API_KEY"
}
data = {
"text": "Welcome to the future of speech technology!"
}
response = requests.post(url, headers=headers, json=data)
if response.status_code == 200:
# Get raw PCM data (32-bit float, little-endian)
raw_audio = response.content
# Convert raw bytes to numpy array of 32-bit floats (little-endian)
float_samples = np.frombuffer(raw_audio, dtype='<f4') # '<f4' = little-endian 32-bit float
# Convert float samples (-1.0 to 1.0) to 16-bit integers for WAV compatibility
# Scale to 16-bit range and convert to integers
int16_samples = (float_samples * 32767).astype(np.int16)
# WAV file parameters
sample_rate = 16000 # 16 kHz
channels = 1 # Mono
sample_width = 2 # 16-bit = 2 bytes (for compatibility)
# Create WAV file
with wave.open("output.wav", "wb") as wav_file:
wav_file.setnchannels(channels)
wav_file.setsampwidth(sample_width)
wav_file.setframerate(sample_rate)
wav_file.writeframes(int16_samples.tobytes())
duration = len(float_samples) / sample_rate
print(f"Speech synthesis completed! Saved {len(raw_audio)} bytes as output.wav")
print(f"Converted {len(float_samples)} float32 samples to 16-bit WAV")
print(f"Duration: {duration:.2f} seconds")
else:
print(f"Error: {response.status_code}")
print(f"Response: {response.text}")
Output format
The Text-to-Speech API returns raw PCM audio data with the following specifications:
- Format: Raw PCM (Pulse Code Modulation)
- Sample Rate: 16 kHz
- Bit Depth: 32-bit float
- Channels: Mono (single channel)
- Byte Order: Little-endian
You can start consuming the generated audio as soon as it's ready. There's no need to wait for the request to complete!
Best practices
- Streaming audio: You can start playing the audio before the response completes
- Use proper punctuation: Periods, commas, and question marks improve natural speech flow
- Spell out abbreviations: Write "Doctor" instead of "Dr." for clearer pronunciation
- Spell out numbers: Write "Twenty-five dollars" instead of "$25"
- Batch requests: Combine multiple short texts into longer requests when possible
- Cache audio: Store generated audio files to avoid repeated API calls for identical text
Deployment
Our text to speech API can be consumed via our managed service or deployed in your own environment.
To find out more about deploying our API in your environment, speak to sales.
Next steps
Ready to build something amazing with text to speech? We'd love to hear about your project and help you succeed.
Get in touch with us:
- Share your feedback and feature requests
- Ask questions about implementation
- Discuss enterprise pricing and custom voices
- Report any issues or bugs you encounter
Contact our team or join our developer community (https://www.reddit.com/r/Speechmatics) to connect with other builders using text to speech.