Microsoft Azure TTS

Microsoft Azure Cognitive Services Text-to-Speech provides high-quality voice synthesis with support for neural voices, custom voice creation, and extensive SSML features.

Authentication

Azure TTS requires a subscription key and region:

from tts_wrapper import MicrosoftTTS

tts = MicrosoftTTS(credentials=(
    'subscription_key',  # Your Azure subscription key
    'region'            # e.g., 'eastus', 'westeurope'
))

tip

Use environment variables for secure credential management:

import os

tts = MicrosoftTTS(credentials=(
    os.getenv('MICROSOFT_TOKEN'),
    os.getenv('MICROSOFT_REGION')
))

note

MicrosoftTTS is an alias for MicrosoftClient that provides the same functionality with a more intuitive name. Both can be used interchangeably, but MicrosoftTTS is recommended for new code.

Features

SSML Support

Azure TTS provides extensive SSML support with unique features:

ssml_text = """
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis"
       xmlns:mstts="http://www.w3.org/2001/mstts">
    Hello <break time="300ms"/> World!
    <prosody rate="-20%" pitch="+20%">
        This is a test of SSML features.
    </prosody>
    <mstts:express-as style="cheerful">
        This text will be spoken in a cheerful way!
    </mstts:express-as>
</speak>
"""
tts.speak(ssml_text)

See Microsoft TTS SSML Reference for all supported tags.

Streaming

Supports real-time audio streaming:

# Stream synthesis for real-time playback
tts.speak_streamed("This text will be synthesized and played in real-time")

Word Timing

Get precise timing information for each word:

def on_word(word: str):
    print(f"Speaking: {word}")

tts.connect("started-word", on_word)
tts.speak("This text will trigger word timing callbacks")

Voice Selection

List and select from available voices:

# Get list of available voices
voices = tts.get_voices()
for voice in voices:
    print(f"Name: {voice['name']}")
    print(f"Locale: {voice['language_codes'][0]}")
    print(f"Gender: {voice['gender']}")
    print("---")

# Set a specific voice
tts.set_voice("en-US-JennyNeural", "en-US")

Voice Styles and Roles

Azure offers unique voice styling capabilities:

ssml_text = """
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis"
       xmlns:mstts="http://www.w3.org/2001/mstts">
    <mstts:express-as style="excited" styledegree="2">
        This is very exciting news!
    </mstts:express-as>
</speak>
"""
tts.speak(ssml_text)

Best Practices

Cost Management
- Use streaming for long text
- Monitor usage through Azure Portal
- Cache frequently used phrases
Performance
- Reuse client instances
- Choose appropriate region
- Use appropriate voice selection for your needs

Error Handling

try:
    tts.speak("Hello, world!")
except Exception as e:
    if "AuthenticationFailed" in str(e):
        print("Check your Azure credentials")
    elif "QuotaExceeded" in str(e):
        print("Usage quota exceeded")
    else:
        print(f"Error: {e}")

Limitations

Maximum text length varies by endpoint
API rate limits apply (check Azure quotas)
Some voice styles only available with specific voices
Neural voices may have higher latency
Custom voice creation requires additional setup

Microsoft Azure TTS

Authentication

Features

SSML Support

Streaming

Word Timing

Voice Selection

Voice Styles and Roles

Best Practices

Limitations

Additional Resources

Next Steps

Authentication​

Features​

SSML Support​

Streaming​

Word Timing​

Voice Selection​

Voice Styles and Roles​

Best Practices​

Limitations​

Additional Resources​

Next Steps​

Authentication

Features

SSML Support

Streaming

Word Timing

Voice Selection

Voice Styles and Roles

Best Practices

Limitations

Additional Resources

Next Steps