Sherpa-ONNX TTS

Sherpa-ONNX is an open-source speech toolkit that provides offline text-to-speech capabilities using ONNX models. It's designed for applications requiring local, privacy-focused speech synthesis.

Platform Support

Sherpa-ONNX works on all major platforms (Linux, macOS, Windows) and requires no internet connection:

from tts_wrapper import SherpaOnnxClient, SherpaOnnxTTS

# Initialize client with optional model paths
client = SherpaOnnxClient(
    model_path=None,  # Uses default model if not specified
    tokens_path=None  # Uses default tokens if not specified
)
tts = SherpaOnnxTTS(client)

Features

Voice Selection

Select from available voices (based on installed models):

# Get list of available voices
voices = tts.get_voices()
for voice in voices:
    print(f"Name: {voice['name']}")
    print(f"Languages: {voice['language_codes']}")
    print("---")

# Set a voice using ISO code
iso_code = "eng"  # Example ISO code
tts.set_voice(iso_code)

Streaming

Supports real-time audio streaming:

# Stream synthesis for real-time playback
tts.speak_streamed("This text will be synthesized and played in real-time")

Audio Playback

Direct audio playback with callback support:

def play_audio_callback(outdata, frames, time, status):
    """Handle audio playback."""
    if status:
        print(f"Audio callback status: {status}")

# Set up audio stream with callback
tts.setup_stream(
    samplerate=22050,
    channels=1,
    dtype="float32"
)

File Output

Save synthesized speech to file:

# Save as WAV
tts.synth_to_file("Hello world", "output.wav")

# Save with specific format
tts.synth_to_file("Hello world", "output.wav", "wav")

Best Practices

Performance
- Reuse client instances
- Use appropriate model size for your needs
- Consider caching frequently used phrases
- Monitor memory usage with large models

Error Handling

try:
    tts.speak("Hello, world!")
except Exception as e:
    if "Model not found" in str(e):
        print("Check model path configuration")
    else:
        print(f"Error: {e}")

Limitations

No SSML support
Limited voice selection (depends on available models)
No custom voice support
Performance depends on hardware
Model size can be significant
Basic prosody control

Audio Settings

Sample Rate

Sherpa-ONNX typically uses a 22050 Hz sample rate:

# Check the current audio rate
print(f"Audio rate: {tts.audio_rate}")  # 22050

Audio Format

Channels: Mono (1 channel)
Sample Width: 16-bit or 32-bit float
Format: PCM or float

# Configure audio settings
tts.setup_stream(
    samplerate=22050,
    channels=1,
    dtype="float32"  # or "int16" for PCM
)

Model Management

Default Models

The wrapper includes default models for basic usage:

# Use default models
client = SherpaOnnxClient()

Custom Models

Use your own ONNX models:

# Use custom models
client = SherpaOnnxClient(
    model_path="path/to/model.onnx",
    tokens_path="path/to/tokens.txt"
)

Language Support

Support depends on available models:

# List available languages
voices = tts.get_voices()
languages = set(lang for voice in voices 
               for lang in voice["language_codes"])
print(f"Available languages: {languages}")

Sherpa-ONNX TTS

Platform Support

Features

Voice Selection

Streaming

Audio Playback

File Output

Best Practices

Limitations

Audio Settings

Sample Rate

Audio Format

Model Management

Default Models

Custom Models

Language Support

Additional Resources

Next Steps

Platform Support​

Features​

Voice Selection​

Streaming​

Audio Playback​

File Output​

Best Practices​

Limitations​

Audio Settings​

Sample Rate​

Audio Format​

Model Management​

Default Models​

Custom Models​

Language Support​

Additional Resources​

Next Steps​

Platform Support

Features

Voice Selection

Streaming

Audio Playback

File Output

Best Practices

Limitations

Audio Settings

Sample Rate

Audio Format

Model Management

Default Models

Custom Models

Language Support

Additional Resources

Next Steps