SAPI TTS
SAPI (Microsoft Speech API) provides native text-to-speech capabilities on Windows systems. It offers access to installed system voices with basic SSML support.
Platform Support
SAPI is only available on Windows systems:
from tts_wrapper import SAPIClient
# Initialize TTS (no credentials needed)
tts = SAPIClient()
SAPIClient
can be used directly as it implements the TTS interface. The legacy pattern with separate SAPITTS
class is still supported for backward compatibility.
Features
Voice Selection
List and select from installed Windows voices:
# Get list of available voices
voices = tts.get_voices()
for voice in voices:
print(f"Name: {voice['name']}")
print(f"Languages: {voice['language_codes']}")
print(f"Gender: {voice['gender']}")
print("---")
# Set a specific voice
tts.set_voice("Microsoft David Desktop")
SSML Support
SAPI provides basic SSML support through Windows Speech API:
ssml_text = """
<speak>
Hello <break time="500ms"/> World!
<prosody rate="slow" pitch="high">
This is a test of SSML features.
</prosody>
</speak>
"""
tts.speak(ssml_text)
Voice Properties
Adjust synthesis properties:
# Set speech rate (-10 to 10)
tts.set_property("rate", "0") # Default is 0
# Set volume (0-100)
tts.set_property("volume", "100")
# Set pitch (-10 to 10)
tts.set_property("pitch", "0")
Word Timing
Get timing information for each word:
def word_callback(word: str, start_time: float, end_time: float):
duration = end_time - start_time
print(f"Word: {word}")
print(f"Start Time: {start_time:.2f}s")
print(f"Duration: {duration:.2f}s")
# Connect the callback
tts.connect("started-word", word_callback)
# Speak with word timing
tts.speak("This will trigger word timing callbacks")
File Output
Save synthesized speech to file:
# Save as WAV
tts.synth_to_file("Hello world", "output.wav")
Best Practices
-
Performance
- Reuse client instances
- Use appropriate speech rate for your needs
- Consider caching frequently used phrases
-
Error Handling
try:
tts.speak("Hello, world!")
except Exception as e:
if "SAPI not available" in str(e):
print("SAPI is not available on this system")
else:
print(f"Error: {e}")
Limitations
- Windows only
- Basic SSML support
- Voice selection limited to installed Windows voices
- No custom voice support
- Performance may vary by system
- Some features require specific Windows versions
Audio Settings
Sample Rate
SAPI uses system-defined sample rates:
# Check the current audio rate
print(f"Audio rate: {tts.audio_rate}")
Audio Format
- Channels: Mono (1 channel)
- Sample Width: 16-bit
- Format: PCM
print(f"Channels: {tts.channels}") # 1
print(f"Sample width: {tts.sample_width}") # 2 (16-bit)
Voice Types
System Voices
Windows includes several built-in voices:
# Common system voices
tts.set_voice("Microsoft David Desktop") # Male voice
tts.set_voice("Microsoft Zira Desktop") # Female voice
Third-Party Voices
SAPI can use additional installed voices:
# List all available voices including third-party
voices = tts.get_voices()
for voice in voices:
print(f"Available voice: {voice['name']}")
Language Support
SAPI supports multiple languages through installed Windows language packs:
# List voices for a specific language
voices = tts.get_voices()
spanish_voices = [v for v in voices if "es" in v["language_codes"][0]]
for voice in spanish_voices:
print(f"Spanish voice: {voice['name']}")
Additional Resources
Next Steps
- Learn about SSML support
- Explore streaming capabilities
- Check out callback functionality