SAPI TTS

SAPI (Microsoft Speech API) provides native text-to-speech capabilities on Windows systems. It offers access to installed system voices with basic SSML support.

Platform Support

SAPI is only available on Windows systems:

from tts_wrapper import SAPIClient, SAPITTS

# Initialize client and TTS
client = SAPIClient()  # No credentials needed
tts = SAPITTS(client)

Features

Voice Selection

List and select from installed Windows voices:

# Get list of available voices
voices = tts.get_voices()
for voice in voices:
    print(f"Name: {voice['name']}")
    print(f"Languages: {voice['language_codes']}")
    print(f"Gender: {voice['gender']}")
    print("---")

# Set a specific voice
tts.set_voice("Microsoft David Desktop")

SSML Support

SAPI provides basic SSML support through Windows Speech API:

ssml_text = """
<speak>
    Hello <break time="500ms"/> World!
    <prosody rate="slow" pitch="high">
        This is a test of SSML features.
    </prosody>
</speak>
"""
tts.speak(ssml_text)

Voice Properties

Adjust synthesis properties:

# Set speech rate (-10 to 10)
tts.set_property("rate", "0")  # Default is 0

# Set volume (0-100)
tts.set_property("volume", "100")

# Set pitch (-10 to 10)
tts.set_property("pitch", "0")

Word Timing

Get timing information for each word:

def word_callback(word: str, start_time: float, end_time: float):
    duration = end_time - start_time
    print(f"Word: {word}")
    print(f"Start Time: {start_time:.2f}s")
    print(f"Duration: {duration:.2f}s")

# Connect the callback
tts.connect("started-word", word_callback)

# Speak with word timing
tts.speak("This will trigger word timing callbacks")

File Output

Save synthesized speech to file:

# Save as WAV
tts.synth_to_file("Hello world", "output.wav")

Best Practices

Performance
- Reuse client instances
- Use appropriate speech rate for your needs
- Consider caching frequently used phrases

Error Handling

try:
    tts.speak("Hello, world!")
except Exception as e:
    if "SAPI not available" in str(e):
        print("SAPI is not available on this system")
    else:
        print(f"Error: {e}")

Limitations

Windows only
Basic SSML support
Voice selection limited to installed Windows voices
No custom voice support
Performance may vary by system
Some features require specific Windows versions

Audio Settings

Sample Rate

SAPI uses system-defined sample rates:

# Check the current audio rate
print(f"Audio rate: {tts.audio_rate}")

Audio Format

Channels: Mono (1 channel)
Sample Width: 16-bit
Format: PCM

print(f"Channels: {tts.channels}")        # 1
print(f"Sample width: {tts.sample_width}") # 2 (16-bit)

Voice Types

System Voices

Windows includes several built-in voices:

# Common system voices
tts.set_voice("Microsoft David Desktop")  # Male voice
tts.set_voice("Microsoft Zira Desktop")   # Female voice

Third-Party Voices

SAPI can use additional installed voices:

# List all available voices including third-party
voices = tts.get_voices()
for voice in voices:
    print(f"Available voice: {voice['name']}")

Language Support

SAPI supports multiple languages through installed Windows language packs:

# List voices for a specific language
voices = tts.get_voices()
spanish_voices = [v for v in voices if "es" in v["language_codes"][0]]
for voice in spanish_voices:
    print(f"Spanish voice: {voice['name']}")

SAPI TTS

Platform Support

Features

Voice Selection

SSML Support

Voice Properties

Word Timing

File Output

Best Practices

Limitations

Audio Settings

Sample Rate

Audio Format

Voice Types

System Voices

Third-Party Voices

Language Support

Additional Resources

Next Steps

Platform Support​

Features​

Voice Selection​

SSML Support​

Voice Properties​

Word Timing​

File Output​

Best Practices​

Limitations​

Audio Settings​

Sample Rate​

Audio Format​

Voice Types​

System Voices​

Third-Party Voices​

Language Support​

Additional Resources​

Next Steps​

Platform Support

Features

Voice Selection

SSML Support

Voice Properties

Word Timing

File Output

Best Practices

Limitations

Audio Settings

Sample Rate

Audio Format

Voice Types

System Voices

Third-Party Voices

Language Support

Additional Resources

Next Steps