IBM Watson TTS

IBM Watson Text-to-Speech provides high-quality voice synthesis with support for neural voices, SSML, and real-time streaming capabilities.

Authentication

IBM Watson TTS requires an API key, region, and instance ID:

from tts_wrapper import WatsonClient, WatsonTTS

client = WatsonClient(credentials=(
    'api_key',      # Your IBM Watson API key
    'api_url',      # Service URL (e.g., 'https://api.eu-gb.text-to-speech.watson.cloud.ibm.com/')
    'region',       # Region (e.g., 'eu-gb')
    'instance_id'   # Instance ID
))

tts = WatsonTTS(client)

tip

Use environment variables for secure credential management:

import os

client = WatsonClient(credentials=(
    os.getenv('WATSON_API_KEY'),
    os.getenv('WATSON_API_URL'),
    os.getenv('WATSON_REGION'),
    os.getenv('WATSON_INSTANCE_ID')
))

Features

SSML Support

IBM Watson provides comprehensive SSML support:

ssml_text = """
<speak>
    Hello <break time="300ms"/> World!
    <prosody rate="slow" pitch="+20Hz">
        This is a test of SSML features.
    </prosody>
</speak>
"""
tts.speak(ssml_text)

See IBM Watson SSML Reference for all supported tags.

Streaming

Supports real-time audio streaming:

# Stream synthesis for real-time playback
tts.speak_streamed("This text will be synthesized and played in real-time")

Word Timing

Get precise timing information for each word:

def on_word(word: str):
    print(f"Speaking: {word}")

tts.connect("started-word", on_word)
tts.speak("This text will trigger word timing callbacks")

Voice Selection

List and select from available voices:

# Get list of available voices
voices = tts.get_voices()
for voice in voices:
    print(f"Name: {voice['name']}")
    print(f"Languages: {voice['language_codes']}")
    print(f"Gender: {voice['gender']}")
    print("---")

# Set a specific voice
tts.set_voice("en-US_AllisonV3Voice", "en-US")

Voice Transformation

IBM Watson supports voice transformation features:

# Adjust speaking rate
tts.set_property("rate", "slow")  # Options: x-slow, slow, medium, fast, x-fast

# Adjust pitch
tts.set_property("pitch", "high")  # Options: x-low, low, medium, high, x-high

# Adjust volume
tts.set_property("volume", "loud")  # Options: soft, medium, loud

Best Practices

Cost Management
- Use appropriate audio formats
- Cache frequently used phrases
- Monitor API usage
- Consider using websocket connections for streaming
Performance
- Reuse client instances
- Choose appropriate audio formats
- Use streaming for long text
- Select closest region for lower latency

Error Handling

try:
    tts.speak("Hello, world!")
except Exception as e:
    if "Unauthorized" in str(e):
        print("Check your IBM Watson credentials")
    elif "QuotaExceeded" in str(e):
        print("Usage quota exceeded")
    else:
        print(f"Error: {e}")

Limitations

Maximum text length per request (5000 characters)
API rate limits apply
Some SSML features are voice-specific
Neural voices not available in all regions
Custom voice creation requires enterprise plan

Voice Types

Neural Voices

Higher quality, more natural-sounding voices:

# Use a neural voice
tts.set_voice("en-US_EmmaV3Voice")

Standard Voices

Traditional voices with consistent quality:

# Use a standard voice
tts.set_voice("en-US_MichaelVoice")

Language Support

IBM Watson TTS supports multiple languages and dialects:

# List voices for a specific language
voices = tts.get_voices()
spanish_voices = [v for v in voices if "es" in v["language_codes"][0]]
for voice in spanish_voices:
    print(f"Spanish voice: {voice['name']}")

IBM Watson TTS

Authentication

Features

SSML Support

Streaming

Word Timing

Voice Selection

Voice Transformation

Best Practices

Limitations

Voice Types

Neural Voices

Standard Voices

Language Support

Additional Resources

Next Steps

Authentication​

Features​

SSML Support​

Streaming​

Word Timing​

Voice Selection​

Voice Transformation​

Best Practices​

Limitations​

Voice Types​

Neural Voices​

Standard Voices​

Language Support​

Additional Resources​

Next Steps​

Authentication

Features

SSML Support

Streaming

Word Timing

Voice Selection

Voice Transformation

Best Practices

Limitations

Voice Types

Neural Voices

Standard Voices

Language Support

Additional Resources

Next Steps