ElevenLabs TTS
ElevenLabs provides high-quality neural text-to-speech with support for voice cloning and customization. It offers some of the most natural-sounding voices available.
Authentication
ElevenLabs requires an API key:
from tts_wrapper import ElevenLabsTTS, ElevenLabsClient
client = ElevenLabsClient(credentials='your_api_key')
tts = ElevenLabsTTS(client)
tip
Use environment variables for secure credential management:
import os
client = ElevenLabsClient(credentials=os.getenv('ELEVENLABS_API_KEY'))
Features
Voice Selection
List and select from available voices:
# Get list of available voices
voices = tts.get_voices()
for voice in voices:
print(f"Name: {voice['name']}")
print(f"Languages: {voice['language_codes']}")
print(f"Gender: {voice['gender']}")
print("---")
# Set a specific voice
tts.set_voice("voice_id")
Streaming
Supports real-time audio streaming:
# Stream synthesis for real-time playback
tts.speak_streamed("This text will be synthesized and played in real-time")
Voice Properties
Adjust voice properties:
# Set stability and similarity boost
tts.set_property("stability", 0.5) # Range: 0-1
tts.set_property("similarity_boost", 0.75) # Range: 0-1
File Output
Save synthesized speech to file:
# Save as MP3
tts.synth_to_file("Hello world", "output.mp3", "mp3")
# Save as WAV
tts.synth_to_file("Hello world", "output.wav", "wav")
Best Practices
-
Cost Management
- Monitor character usage
- Cache frequently used phrases
- Use appropriate stability settings
-
Performance
- Reuse client instances
- Consider chunking long text
- Balance stability vs. performance
-
Error Handling
try:
tts.speak("Hello, world!")
except Exception as e:
if "Unauthorized" in str(e):
print("Check your ElevenLabs API key")
elif "QuotaExceeded" in str(e):
print("Character quota exceeded")
else:
print(f"Error: {e}")
Limitations
- No SSML support (tags will be stripped)
- No native word timing support
- API rate limits apply
- Character quota based on subscription
- Voice cloning requires additional setup
- Limited to English and a few other languages
Voice Optimization
Stability vs. Similarity
- Stability (0-1): Higher values produce more consistent, stable speech
- Similarity Boost (0-1): Higher values make the voice more expressive but may introduce artifacts
# For consistent, stable output
tts.set_property("stability", 0.8)
tts.set_property("similarity_boost", 0.3)
# For more expressive, varied output
tts.set_property("stability", 0.3)
tts.set_property("similarity_boost", 0.8)
Custom Voices
ElevenLabs supports voice cloning, but this must be set up through their platform:
- Create custom voice on ElevenLabs website
- Get the voice ID
- Use it in your code:
tts.set_voice("custom_voice_id")
Additional Resources
Next Steps
- Explore streaming capabilities
- Check out callback functionality
- Learn about audio control features