ElevenLabs WebSocket TTS

ElevenLabsWebSocketTTS is an opt-in, low-latency variant of ElevenLabsTTS that streams over the ElevenLabs /v1/text-to-speech/{voice_id}/stream-input WebSocket endpoint instead of the HTTP /stream endpoint. It is a drop-in replacement: same constructor surface, same synthesize(text) async iterator, same telephony factories (for_twilio, for_telnyx).

Why use it

Saves ~50 ms HTTP request setup per utterance. No new HTTP request / TLS handshake is built for each turn.
Avoids cold-start TLS when calls are bursty (the WebSocket holds a warm connection for the duration of the utterance).
Native telephony output formats — μ-law @ 8 kHz for Twilio and PCM @ 16 kHz for Telnyx, no client-side resampling.

When not to use it:

You need eleven_v3 / eleven_v3_preview — those models are not supported by the stream-input WebSocket. Use the HTTP ElevenLabsTTS instead.
Your traffic is so low that the per-utterance HTTP round trip is irrelevant.

Install

websockets is already a runtime dependency of getpatter, so no extra install is required:

pip install getpatter

Quickstart

from getpatter import ElevenLabsWebSocketTTS

# Reads ELEVENLABS_API_KEY from env
tts = ElevenLabsWebSocketTTS()

# Twilio-native μ-law @ 8 kHz (no resampling)
tts = ElevenLabsWebSocketTTS.for_twilio(api_key="...")

# Telnyx-native PCM @ 16 kHz
tts = ElevenLabsWebSocketTTS.for_telnyx(api_key="...")

Namespaced import:

from getpatter.tts import elevenlabs_ws

tts = elevenlabs_ws.TTS()
tts = elevenlabs_ws.TTS.for_twilio(api_key="...")

In an agent:

import asyncio
from getpatter import Patter, Twilio, DeepgramSTT, AnthropicLLM, ElevenLabsWebSocketTTS

phone = Patter(carrier=Twilio(), phone_number="+15550001234")

agent = phone.agent(
    stt=DeepgramSTT(),
    llm=AnthropicLLM(),
    tts=ElevenLabsWebSocketTTS.for_twilio(api_key="..."),
    system_prompt="You are a helpful assistant.",
)

asyncio.run(phone.serve(agent))

Constructor parameters

Parameter	Type	Default	Description
`api_key`	`str \| None`	`None`	API key — reads from `ELEVENLABS_API_KEY` if omitted.
`voice_id`	`str`	`"21m00Tcm4TlvDq8ikWAM"`	ElevenLabs voice ID (or name).
`model_id`	`str`	`"eleven_flash_v2_5"`	Model preset. `eleven_v3` is not* supported on this endpoint.
`output_format`	`str`	`"pcm_16000"`	Wire format. Use `"ulaw_8000"` for Twilio Media Streams or `"pcm_16000"` for Telnyx.
`voice_settings`	`dict \| None`	`None`	Voice settings (`stability`, `similarity_boost`, `use_speaker_boost`, …).
`language_code`	`str \| None`	`None`	ISO 639-1 language code.
`auto_mode`	`bool`	`True`	When `True`, ElevenLabs handles internal chunk scheduling. Pass `False` to take manual control via `chunk_length_schedule`.
`inactivity_timeout`	`int`	`60`	Seconds the server holds the WS open with no input before closing. Max documented value: 180.
`chunk_length_schedule`	`list[int] \| None`	`None`	Custom chunk schedule. Each value must be in `[5, 500]`. Only honored when `auto_mode=False`.
`open_timeout`	`float`	`5.0`	Seconds to wait for the WS handshake before raising.
`frame_timeout`	`float`	`30.0`	Seconds to wait for each subsequent server frame before raising `ElevenLabsTTSError`.

Telephony factories

ElevenLabsWebSocketTTS.for_twilio(...) and ElevenLabsWebSocketTTS.for_telnyx(...) mirror the HTTP variant. They pre-set output_format and (for Twilio) tune voice_settings for low-bandwidth μ-law:

# Twilio: ulaw_8000 + speaker_boost off, moderate stability
tts = ElevenLabsWebSocketTTS.for_twilio(api_key="...")

# Telnyx: pcm_16000 native
tts = ElevenLabsWebSocketTTS.for_telnyx(api_key="...")

Carrier auto-detect — `set_telephony_carrier`

When you don’t know the carrier at construction time, StreamHandler calls set_telephony_carrier(carrier) at call start to advise the provider of the wire format:

tts = ElevenLabsWebSocketTTS()        # output_format defaults to pcm_16000
tts.set_telephony_carrier("twilio")    # auto-flips to ulaw_8000
tts.set_telephony_carrier("telnyx")    # keeps pcm_16000 (Telnyx default)

When output_format was passed explicitly to the constructor (or via for_twilio / for_telnyx), set_telephony_carrier is a no-op — the user’s choice always wins. Calling with an unknown carrier ("" / "custom") is also a no-op.

Limitations

eleven_v3 family is rejected at construction time. The stream-input WebSocket does not support v3 models. Use the HTTP ElevenLabsTTS instead.
Per-utterance lifecycle. A new WebSocket is opened and closed per synthesize(text) call, matching HTTP semantics. A pooled WS shared across turns of the same call session is on the roadmap.
optimize_streaming_latency is officially deprecated by ElevenLabs and is not exposed.

Errors

from getpatter.providers.elevenlabs_ws_tts import ElevenLabsTTSError

ElevenLabsTTSError is raised when:

The server emits a JSON error frame.
No frame is received within frame_timeout seconds (stalled connection).
A binary audio frame exceeds the safety cap (512 KB).

The connection is always closed in finally, and a best-effort close_context message is sent so ElevenLabs stops billing for unconsumed audio.

Get Started

Setting up Patter

Observability

Integrations

Development

ElevenLabs WebSocket

ElevenLabs WebSocket TTS

Why use it

Install

Quickstart

Constructor parameters

Telephony factories

Carrier auto-detect — `set_telephony_carrier`

Limitations

Errors

See also

​ElevenLabs WebSocket TTS

​Why use it

​Install

​Quickstart

​Constructor parameters

​Telephony factories

​Carrier auto-detect — set_telephony_carrier

​Limitations

​Errors

​See also

ElevenLabs WebSocket TTS

Why use it

Install

Quickstart

Constructor parameters

Telephony factories

Carrier auto-detect — `set_telephony_carrier`

Limitations

Errors

See also