AVSynth TTS
AVSynth provides native text-to-speech capabilities on macOS using the AVSpeechSynthesizer framework. It offers high-quality system voices with real-time streaming and word timing support.
Platform Support
AVSynth is only available on macOS systems. The engine will not be available on other platforms.
from tts_wrapper import AVSynthClient, AVSynthTTS
# Initialize client and TTS
client = AVSynthClient() # No credentials needed
tts = AVSynthTTS(client)
Features
Voice Selection
List and select from available system voices:
# Get list of available voices
voices = tts.get_voices()
for voice in voices:
print(f"Name: {voice['name']}")
print(f"Languages: {voice['language_codes']}")
print(f"Gender: {voice['gender']}")
print("---")
# Set a specific voice
tts.set_voice("com.apple.voice.compact.en-US.Samantha")
Streaming
Supports real-time audio streaming with low latency:
# Stream synthesis for real-time playback
tts.speak_streamed("This text will be synthesized and played in real-time")
Voice Properties
Adjust synthesis properties:
# Set speech rate (0-100, default is 50)
tts.set_property("rate", "50")
# Set volume (0-100)
tts.set_property("volume", "100")
# Set pitch (0.5-2.0)
tts.set_property("pitch", "1.0")
Word Timing
Get precise timing information for each word:
def word_callback(word: str, start_time: float, end_time: float):
duration = end_time - start_time
print(f"Word: {word}")
print(f"Start Time: {start_time:.2f}s")
print(f"Duration: {duration:.2f}s")
# Connect the callback
tts.connect("started-word", word_callback)
# Speak with word timing
tts.speak("This will trigger word timing callbacks")
File Output
Save synthesized speech to file:
# Save as WAV
tts.synth_to_file("Hello world", "output.wav")
Best Practices
-
Performance
- Reuse client instances
- Use streaming for real-time applications
- Set appropriate audio rate for your needs
-
Error Handling
try:
tts.speak("Hello, world!")
except Exception as e:
if "AVSpeechSynthesizer" in str(e):
print("Speech synthesis error")
else:
print(f"Error: {e}")
Limitations
- macOS only
- Limited SSML support (tags converted to native commands)
- Voice selection limited to installed system voices
- No custom voice support
- Some features may require newer macOS versions
Audio Settings
Sample Rate
AVSynth uses a default sample rate of 22050 Hz for more natural speech:
# The audio rate is set automatically but can be checked
print(f"Audio rate: {tts.audio_rate}") # 22050
Audio Format
- Channels: Mono (1 channel)
- Sample Width: 16-bit
- Format: PCM
print(f"Channels: {tts.channels}") # 1
print(f"Sample width: {tts.sample_width}") # 2 (16-bit)
Voice Types
Compact Voices
Standard system voices with good quality:
tts.set_voice("com.apple.voice.compact.en-US.Samantha")
Premium Voices
Higher quality voices (if installed):
tts.set_voice("com.apple.voice.premium.en-US.Samantha")
Language Support
AVSynth supports multiple languages based on installed system voices:
# List voices for a specific language
voices = tts.get_voices()
french_voices = [v for v in voices if "fr" in v["language_codes"][0]]
for voice in french_voices:
print(f"French voice: {voice['name']}")
Additional Resources
Next Steps
- Learn about streaming capabilities
- Check out callback functionality
- Explore audio control features