SAPI TTS
SAPI (Microsoft Speech API) provides native text-to-speech capabilities on Windows systems. It offers access to installed system voices with basic SSML support.
Platform Support
SAPI is only available on Windows systems:
from tts_wrapper import SAPIClient, SAPITTS
# Initialize client and TTS
client = SAPIClient() # No credentials needed
tts = SAPITTS(client)
Features
Voice Selection
List and select from installed Windows voices:
# Get list of available voices
voices = tts.get_voices()
for voice in voices:
print(f"Name: {voice['name']}")
print(f"Languages: {voice['language_codes']}")
print(f"Gender: {voice['gender']}")
print("---")
# Set a specific voice
tts.set_voice("Microsoft David Desktop")
SSML Support
SAPI provides basic SSML support through Windows Speech API:
ssml_text = """
<speak>
Hello <break time="500ms"/> World!
<prosody rate="slow" pitch="high">
This is a test of SSML features.
</prosody>
</speak>
"""
tts.speak(ssml_text)
Voice Properties
Adjust synthesis properties:
# Set speech rate (-10 to 10)
tts.set_property("rate", "0") # Default is 0
# Set volume (0-100)
tts.set_property("volume", "100")
# Set pitch (-10 to 10)
tts.set_property("pitch", "0")
Word Timing
Get timing information for each word:
def word_callback(word: str, start_time: float, end_time: float):
duration = end_time - start_time
print(f"Word: {word}")
print(f"Start Time: {start_time:.2f}s")
print(f"Duration: {duration:.2f}s")
# Connect the callback
tts.connect("started-word", word_callback)
# Speak with word timing
tts.speak("This will trigger word timing callbacks")
File Output
Save synthesized speech to file:
# Save as WAV
tts.synth_to_file("Hello world", "output.wav")
Best Practices
-
Performance
- Reuse client instances
- Use appropriate speech rate for your needs
- Consider caching frequently used phrases
-
Error Handling
try:
tts.speak("Hello, world!")
except Exception as e:
if "SAPI not available" in str(e):
print("SAPI is not available on this system")
else:
print(f"Error: {e}")
Limitations
- Windows only
- Basic SSML support
- Voice selection limited to installed Windows voices
- No custom voice support
- Performance may vary by system
- Some features require specific Windows versions
Audio Settings
Sample Rate
SAPI uses system-defined sample rates:
# Check the current audio rate
print(f"Audio rate: {tts.audio_rate}")
Audio Format
- Channels: Mono (1 channel)
- Sample Width: 16-bit
- Format: PCM
print(f"Channels: {tts.channels}") # 1
print(f"Sample width: {tts.sample_width}") # 2 (16-bit)
Voice Types
System Voices
Windows includes several built-in voices:
# Common system voices
tts.set_voice("Microsoft David Desktop") # Male voice
tts.set_voice("Microsoft Zira Desktop") # Female voice
Third-Party Voices
SAPI can use additional installed voices:
# List all available voices including third-party
voices = tts.get_voices()
for voice in voices:
print(f"Available voice: {voice['name']}")
Language Support
SAPI supports multiple languages through installed Windows language packs:
# List voices for a specific language
voices = tts.get_voices()
spanish_voices = [v for v in voices if "es" in v["language_codes"][0]]
for voice in spanish_voices:
print(f"Spanish voice: {voice['name']}")
Additional Resources
Next Steps
- Learn about SSML support
- Explore streaming capabilities
- Check out callback functionality