Documentation Index
Fetch the complete documentation index at: https://docs.getpatter.com/llms.txt
Use this file to discover all available pages before exploring further.
ElevenLabs WebSocket TTS
ElevenLabsWebSocketTTS is an opt-in, low-latency variant of ElevenLabsTTS that streams over the ElevenLabs /v1/text-to-speech/{voice_id}/stream-input WebSocket endpoint instead of the HTTP /stream endpoint.
It is a drop-in replacement: same constructor surface, same synthesize(text) async iterator, same telephony factories (for_twilio, for_telnyx).
Why use it
- Saves ~50 ms HTTP request setup per utterance. No new HTTP request / TLS handshake is built for each turn.
- Avoids cold-start TLS when calls are bursty (the WebSocket holds a warm connection for the duration of the utterance).
- Native telephony output formats — μ-law @ 8 kHz for Twilio and PCM @ 16 kHz for Telnyx, no client-side resampling.
- You need
eleven_v3/eleven_v3_preview— those models are not supported by the stream-input WebSocket. Use the HTTPElevenLabsTTSinstead. - Your traffic is so low that the per-utterance HTTP round trip is irrelevant.
Install
websockets is already a runtime dependency of getpatter, so no extra install is required:
Quickstart
Constructor parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
api_key | str | None | None | API key — reads from ELEVENLABS_API_KEY if omitted. |
voice_id | str | "21m00Tcm4TlvDq8ikWAM" | ElevenLabs voice ID (or name). |
model_id | str | "eleven_flash_v2_5" | Model preset. eleven_v3* is not supported on this endpoint. |
output_format | str | "pcm_16000" | Wire format. Use "ulaw_8000" for Twilio Media Streams or "pcm_16000" for Telnyx. |
voice_settings | dict | None | None | Voice settings (stability, similarity_boost, use_speaker_boost, …). |
language_code | str | None | None | ISO 639-1 language code. |
auto_mode | bool | True | When True, ElevenLabs handles internal chunk scheduling. Pass False to take manual control via chunk_length_schedule. |
inactivity_timeout | int | 60 | Seconds the server holds the WS open with no input before closing. Max documented value: 180. |
chunk_length_schedule | list[int] | None | None | Custom chunk schedule. Each value must be in [5, 500]. Only honored when auto_mode=False. |
open_timeout | float | 5.0 | Seconds to wait for the WS handshake before raising. |
frame_timeout | float | 30.0 | Seconds to wait for each subsequent server frame before raising ElevenLabsTTSError. |
Telephony factories
ElevenLabsWebSocketTTS.for_twilio(...) and ElevenLabsWebSocketTTS.for_telnyx(...) mirror the HTTP variant. They pre-set output_format and (for Twilio) tune voice_settings for low-bandwidth μ-law:
Limitations
eleven_v3family is rejected at construction time. The stream-input WebSocket does not support v3 models. Use the HTTPElevenLabsTTSinstead.- Per-utterance lifecycle. A new WebSocket is opened and closed per
synthesize(text)call, matching HTTP semantics. A pooled WS shared across turns of the same call session is on the roadmap. optimize_streaming_latencyis officially deprecated by ElevenLabs and is not exposed.
Errors
ElevenLabsTTSError is raised when:
- The server emits a JSON
errorframe. - No frame is received within
frame_timeoutseconds (stalled connection). - A binary audio frame exceeds the safety cap (
512 KB).
finally, and a best-effort close_context message is sent so ElevenLabs stops billing for unconsumed audio.
See also
- TTS overview — provider table and shared concepts.
- HTTP variant:
ElevenLabsTTS.

