AVSynth TTS

AVSynth provides native text-to-speech capabilities on macOS using the AVSpeechSynthesizer framework. It offers high-quality system voices with real-time streaming and word timing support.

Platform Support

AVSynth is only available on macOS systems. The engine will not be available on other platforms.

from tts_wrapper import AVSynthClient, AVSynthTTS

# Initialize client and TTS
client = AVSynthClient()  # No credentials needed
tts = AVSynthTTS(client)

Features

Voice Selection

List and select from available system voices:

# Get list of available voices
voices = tts.get_voices()
for voice in voices:
    print(f"Name: {voice['name']}")
    print(f"Languages: {voice['language_codes']}")
    print(f"Gender: {voice['gender']}")
    print("---")

# Set a specific voice
tts.set_voice("com.apple.voice.compact.en-US.Samantha")

Streaming

Supports real-time audio streaming with low latency:

# Stream synthesis for real-time playback
tts.speak_streamed("This text will be synthesized and played in real-time")

Voice Properties

Adjust synthesis properties:

# Set speech rate (0-100, default is 50)
tts.set_property("rate", "50")

# Set volume (0-100)
tts.set_property("volume", "100")

# Set pitch (0.5-2.0)
tts.set_property("pitch", "1.0")

Word Timing

Get precise timing information for each word:

def word_callback(word: str, start_time: float, end_time: float):
    duration = end_time - start_time
    print(f"Word: {word}")
    print(f"Start Time: {start_time:.2f}s")
    print(f"Duration: {duration:.2f}s")

# Connect the callback
tts.connect("started-word", word_callback)

# Speak with word timing
tts.speak("This will trigger word timing callbacks")

File Output

Save synthesized speech to file:

# Save as WAV
tts.synth_to_file("Hello world", "output.wav")

Best Practices

Performance
- Reuse client instances
- Use streaming for real-time applications
- Set appropriate audio rate for your needs

Error Handling

try:
    tts.speak("Hello, world!")
except Exception as e:
    if "AVSpeechSynthesizer" in str(e):
        print("Speech synthesis error")
    else:
        print(f"Error: {e}")

Limitations

macOS only
Limited SSML support (tags converted to native commands)
Voice selection limited to installed system voices
No custom voice support
Some features may require newer macOS versions

Audio Settings

Sample Rate

AVSynth uses a default sample rate of 22050 Hz for more natural speech:

# The audio rate is set automatically but can be checked
print(f"Audio rate: {tts.audio_rate}")  # 22050

Audio Format

Channels: Mono (1 channel)
Sample Width: 16-bit
Format: PCM

print(f"Channels: {tts.channels}")        # 1
print(f"Sample width: {tts.sample_width}") # 2 (16-bit)

Voice Types

Compact Voices

Standard system voices with good quality:

tts.set_voice("com.apple.voice.compact.en-US.Samantha")

Premium Voices

Higher quality voices (if installed):

tts.set_voice("com.apple.voice.premium.en-US.Samantha")

Language Support

AVSynth supports multiple languages based on installed system voices:

# List voices for a specific language
voices = tts.get_voices()
french_voices = [v for v in voices if "fr" in v["language_codes"][0]]
for voice in french_voices:
    print(f"French voice: {voice['name']}")

AVSynth TTS

Platform Support

Features

Voice Selection

Streaming

Voice Properties

Word Timing

File Output

Best Practices

Limitations

Audio Settings

Sample Rate

Audio Format

Voice Types

Compact Voices

Premium Voices

Language Support

Additional Resources

Next Steps

Platform Support​

Features​

Voice Selection​

Streaming​

Voice Properties​

Word Timing​

File Output​

Best Practices​

Limitations​

Audio Settings​

Sample Rate​

Audio Format​

Voice Types​

Compact Voices​

Premium Voices​

Language Support​

Additional Resources​

Next Steps​

Platform Support

Features

Voice Selection

Streaming

Voice Properties

Word Timing

File Output

Best Practices

Limitations

Audio Settings

Sample Rate

Audio Format

Voice Types

Compact Voices

Premium Voices

Language Support

Additional Resources

Next Steps