OpenAI TTS

OpenAI provides high-quality neural text-to-speech through their GPT-4o mini TTS model and other TTS models. It offers natural-sounding voices with support for multiple languages and voice characteristics control through instructions.

Authentication

OpenAI requires an API key:

from tts_wrapper import OpenAIClient

# Create the client with your API key
tts = OpenAIClient(api_key='your_api_key')

# Verify credentials are valid
if tts.check_credentials():
    print("OpenAI credentials are valid")
else:
    print("Invalid OpenAI credentials")

tip

Use environment variables for secure credential management:

import os

tts = OpenAIClient(api_key=os.getenv('OPENAI_API_KEY'))

Features

Voice Selection

List and select from available voices:

# Get list of available voices
voices = tts.get_voices()
for voice in voices:
    print(f"Name: {voice['name']}")
    print(f"ID: {voice['id']}")
    print(f"Gender: {voice['gender']}")
    print("---")

# Set a specific voice
tts.set_voice("alloy")  # Other options: nova, echo, fable, onyx, shimmer, etc.

Model Selection

Choose from different TTS models:

# Create client with specific model
tts = OpenAIClient(
    api_key='your_api_key',
    model='gpt-4o-mini-tts'  # Default model
)

# Other models include:
# - tts-1 (lower latency)
# - tts-1-hd (higher quality)

Streaming

Supports real-time audio streaming:

# Stream synthesis for real-time playback
tts.speak_streamed("This text will be synthesized and played in real-time")

Voice Properties via Instructions

Control voice characteristics through properties:

# Set rate (speed)
tts.set_property("rate", 0.7)  # Slow
tts.set_property("rate", 1.3)  # Fast

# Set volume
tts.set_property("volume", 0.6)  # Quiet
tts.set_property("volume", 1.4)  # Loud

# Set pitch
tts.set_property("pitch", 0.7)  # Low pitch
tts.set_property("pitch", 1.3)  # High pitch

# You can also use string values
tts.set_property("rate", "moderate")
tts.set_property("volume", "medium")
tts.set_property("pitch", "normal")

Custom Instructions

Provide custom instructions for more control:

# Create client with specific instructions
tts = OpenAIClient(
    api_key='your_api_key',
    instructions="Speak in a cheerful and positive tone with slight British accent."
)

File Output

Save synthesized speech to file:

# Save as MP3
tts.synth("Hello world", "output.mp3", "mp3")

# Save as WAV
tts.synth("Hello world", "output.wav", "wav")

Best Practices

Cost Management
- Monitor API usage
- Cache frequently used phrases
- Use appropriate models for your needs (tts-1 for lower latency, tts-1-hd for higher quality)
Performance
- Use WAV or PCM format for streaming for lowest latency
- For file output, MP3 provides good compression
- Balance quality vs. performance with model selection

Error Handling

# Validate credentials before attempting synthesis
if not tts.check_credentials():
    print("Invalid OpenAI API key. Please check your credentials.")
    exit(1)

# Handle synthesis errors
try:
    tts.speak("Hello, world!")
except Exception as e:
    if "Unauthorized" in str(e):
        print("Check your OpenAI API key")
    elif "Rate limit" in str(e):
        print("API rate limit exceeded")
    else:
        print(f"Error: {e}")

Limitations

No SSML support (tags will be stripped)
No word timing information (estimated timings only)
No word event callbacks
API rate limits apply
Usage costs based on character count
No custom voice creation
Voices optimized for English, though multiple languages are supported

Language Support

OpenAI TTS supports all languages that the Whisper model supports, including:

Afrikaans, Arabic, Armenian, Azerbaijani, Belarusian, Bosnian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kannada, Kazakh, Korean, Latvian, Lithuanian, Macedonian, Malay, Marathi, Maori, Nepali, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Tamil, Thai, Turkish, Ukrainian, Urdu, Vietnamese, and Welsh.

Voice Customization

While OpenAI doesn't support custom voice creation, you can customize the voice output using instructions:

# Create client with specific instructions
tts = OpenAIClient(
    api_key='your_api_key',
    instructions="Speak with a deep, resonant voice. Use a slow, deliberate pace with slight pauses between sentences."
)

You can also combine properties with instructions:

tts = OpenAIClient(
    api_key='your_api_key',
    instructions="Speak with a slight French accent."
)

# Add properties on top of instructions
tts.set_property("rate", 0.9)  # Slightly slow
tts.set_property("pitch", 1.1)  # Slightly high pitch

Output Formats

OpenAI supports multiple output formats:

MP3 (default): Good for general use
WAV: Uncompressed, good for processing
PCM: Raw samples, lowest latency
OPUS: Good for streaming
AAC: Good for mobile
FLAC: Lossless compression

# Save in different formats
tts.synth("Hello world", "output.mp3", "mp3")
tts.synth("Hello world", "output.wav", "wav")
tts.synth("Hello world", "output.flac", "flac")

Additional Resources

OpenAI TTS Documentation
OpenAI API Reference
OpenAI Pricing
OpenAI.fm Demo - Interactive demo for trying voices

Next Steps

Explore streaming capabilities
Check out audio control features
Learn about voice selection and languages

Authentication​

Features​

Voice Selection​

Model Selection​

Streaming​

Voice Properties via Instructions​

Custom Instructions​

File Output​

Best Practices​

Limitations​

Language Support​

Voice Customization​

Output Formats​

Additional Resources​

Next Steps​