TTS (Text-to-Speech)

TTS is used in pipeline mode to synthesize the agent’s response audio. If you use an engine such as OpenAIRealtime or ElevenLabsConvAI, speech synthesis is handled internally by the engine. Each TTS ships as both a namespaced class (from getpatter.tts import elevenlabs → elevenlabs.TTS()) and a flat alias (from getpatter import ElevenLabsTTS). They are equivalent — the flat aliases are convenient for short examples, the namespaced form avoids name collisions when mixing providers.

Quickstart

import asyncio
from getpatter import Patter, Twilio, DeepgramSTT, ElevenLabsTTS

phone = Patter(carrier=Twilio(), phone_number="+15550001234")  # TWILIO_* from env

agent = phone.agent(
    stt=DeepgramSTT(),                            # DEEPGRAM_API_KEY from env
    tts=ElevenLabsTTS(voice_id="rachel"),         # ELEVENLABS_API_KEY from env
    system_prompt="You are a helpful assistant.",
)

async def main():
    await phone.serve(agent)

asyncio.run(main())

The same agent using namespaced imports:

from getpatter.stt import deepgram
from getpatter.tts import elevenlabs

agent = phone.agent(
    stt=deepgram.STT(),
    tts=elevenlabs.TTS(voice_id="rachel"),
    system_prompt="You are a helpful assistant.",
)

Supported providers

Flat import	Namespaced import	Env var	Install extra
`ElevenLabsTTS`	`getpatter.tts.elevenlabs.TTS`	`ELEVENLABS_API_KEY`	included
`ElevenLabsWebSocketTTS`	`getpatter.tts.elevenlabs_ws.TTS`	`ELEVENLABS_API_KEY`	included
`OpenAITTS`	`getpatter.tts.openai.TTS`	`OPENAI_API_KEY`	included
`CartesiaTTS`	`getpatter.tts.cartesia.TTS`	`CARTESIA_API_KEY`	`getpatter[cartesia]`
`RimeTTS`	`getpatter.tts.rime.TTS`	`RIME_API_KEY`	`getpatter[rime]`
`LMNTTTS`	`getpatter.tts.lmnt.TTS`	`LMNT_API_KEY`	`getpatter[lmnt]`

Model / voice / format enums

Each provider exports typed StrEnums for valid model IDs, voice presets, and output formats alongside the provider class. They keep model= / voice= / output_format= arguments tab-completable and reject typos at construction time, while still accepting raw strings for forward compatibility:

from getpatter.providers.openai_tts import OpenAITTSModel, OpenAITTSVoice
from getpatter.providers.elevenlabs_tts import ElevenLabsModel, ElevenLabsOutputFormat
from getpatter.providers.cartesia_tts import CartesiaTTSModel, CartesiaVoiceSpeed
from getpatter.providers.rime_tts import RimeModel, RimeAudioFormat
from getpatter.providers.lmnt_tts import LMNTModel, LMNTAudioFormat

tts = OpenAITTS(voice=OpenAITTSVoice.NOVA, model=OpenAITTSModel.GPT_4O_MINI_TTS)

ElevenLabs

Streaming HTTP TTS via ElevenLabs. Default model "eleven_flash_v2_5" (~75 ms TTFB, drop-in replacement for eleven_turbo_v2_5). Other valid model_id literals: "eleven_v3", "eleven_turbo_v2_5", "eleven_multilingual_v2", "eleven_monolingual_v1".

from getpatter import ElevenLabsTTS

tts = ElevenLabsTTS()                        # reads ELEVENLABS_API_KEY
tts = ElevenLabsTTS(voice_id="rachel")
tts = ElevenLabsTTS(api_key="...", voice_id="EXAVITQu4vr4xnSDxMaL", model_id="eleven_v3")

Parameter	Type	Default	Description
`api_key`	`str \| None`	`None`	API key — reads from `ELEVENLABS_API_KEY` if omitted.
`voice_id`	`str`	`"21m00Tcm4TlvDq8ikWAM"` (Rachel)	ElevenLabs voice ID (or name).
`model_id`	`ElevenLabsModel \| str`	`"eleven_flash_v2_5"`	Typed literal: `eleven_flash_v2_5` / `eleven_turbo_v2_5` / `eleven_v3` / `eleven_multilingual_v2` / `eleven_monolingual_v1`.
`output_format`	`str`	`"pcm_16000"`	ElevenLabs output format.

Telephony factories — `for_twilio()` / `for_telnyx()`

When ElevenLabs runs in pipeline mode behind a phone carrier you can negotiate the carrier-native codec at the ElevenLabs HTTP layer and skip per-chunk SDK-side transcoding. The factory variants do that for you:

from getpatter import ElevenLabsTTS

# Twilio Media Streams: μ-law @ 8 kHz native — no resample, no μ-law encode in Python.
tts = ElevenLabsTTS.for_twilio(voice_id="rachel")

# Telnyx default: PCM @ 16 kHz native — no resample.
tts = ElevenLabsTTS.for_telnyx(voice_id="rachel")

CartesiaTTS.for_twilio() / for_telnyx() and ElevenLabsConvAI.for_twilio() / for_telnyx() work the same way. Use them whenever you know the call will go out over Twilio or Telnyx — they shave tens of milliseconds off TTFB and drop CPU on long calls.

WebSocket variant

ElevenLabsWebSocketTTS is an opt-in low-latency drop-in for ElevenLabsTTS that uses the /stream-input WebSocket endpoint. It saves ~50 ms of HTTP request setup per utterance and avoids TLS cold-starts on bursty traffic. See the ElevenLabs WebSocket setup page for full details.

from getpatter import ElevenLabsWebSocketTTS

tts = ElevenLabsWebSocketTTS()                       # reads ELEVENLABS_API_KEY
tts = ElevenLabsWebSocketTTS.for_twilio(api_key="...")   # ulaw_8000 native
tts = ElevenLabsWebSocketTTS.for_telnyx(api_key="...")   # pcm_16000 native

The WebSocket endpoint does not support eleven_v3* models — use the HTTP ElevenLabsTTS for v3.

OpenAI

from getpatter import OpenAITTS

tts = OpenAITTS()                            # reads OPENAI_API_KEY
tts = OpenAITTS(voice="nova")

# Twilio: skip the intermediate 16 kHz step — resample 24k → 8k directly.
tts = OpenAITTS(target_sample_rate=8000)

Parameter	Type	Default	Description
`api_key`	`str \| None`	`None`	API key — reads from `OPENAI_API_KEY` if omitted.
`voice`	`OpenAITTSVoice \| str`	`OpenAITTSVoice.ALLOY`	One of `alloy`, `echo`, `fable`, `onyx`, `nova`, `shimmer`.
`model`	`OpenAITTSModel \| str`	`OpenAITTSModel.GPT_4O_MINI_TTS`	OpenAI TTS model ID. Older `tts-1` / `tts-1-hd` are accepted as raw strings.
`instructions`	`str \| None`	`None`	Voice direction (only honored by `gpt-4o-mini-tts` and newer).
`speed`	`float \| None`	`None`	Playback speed multiplier in `[0.25, 4.0]`.
`target_sample_rate`	`int`	`16000`	Output sample rate. Must be `8000` or `16000`. Set to `8000` for Twilio carriers to collapse the 24 k→16 k→8 k chain into a single resample (~1 ms saved per chunk).

OpenAITTSVoice and OpenAITTSModel are exported alongside the provider class:

from getpatter.providers.openai_tts import OpenAITTSVoice, OpenAITTSModel

OpenAI TTS returns audio at 24 kHz — Patter automatically resamples to target_sample_rate (16 kHz by default; pass target_sample_rate=8000 to deliver μ-law-ready PCM directly to Twilio).

Cartesia

Raw PCM streaming via Cartesia’s sonic-2 bytes endpoint. See Cartesia setup.

from getpatter import CartesiaTTS

tts = CartesiaTTS()                          # reads CARTESIA_API_KEY
tts = CartesiaTTS(voice="f786b574-daa5-4673-aa0c-cbe3e8534c02")  # Katie

Rime

Arcana (high fidelity) and Mist (low latency) via Rime’s HTTP endpoint. See Rime setup.

from getpatter import RimeTTS

tts = RimeTTS()                              # reads RIME_API_KEY
tts = RimeTTS(model="arcana", speaker="astra")
tts = RimeTTS(model="mistv2", speaker="cove", speed_alpha=1.1, reduce_latency=True)

LMNT

Blizzard and Aurora via the LMNT HTTP API. See LMNT setup.

from getpatter import LMNTTTS

tts = LMNTTTS()                              # reads LMNT_API_KEY
tts = LMNTTTS(model="blizzard", voice="leah")

Missing credentials

Each class raises ValueError at construction time if no API key is resolved:

ValueError: ElevenLabs TTS requires an api_key. Pass api_key='...' or
set ELEVENLABS_API_KEY in the environment.

TTS

TTS (Text-to-Speech)

Quickstart

Supported providers

Model / voice / format enums

ElevenLabs

Telephony factories — `for_twilio()` / `for_telnyx()`

WebSocket variant

OpenAI

Cartesia

Rime

LMNT

Missing credentials

What’s Next

STT

LLM

​TTS (Text-to-Speech)

​Quickstart

​Supported providers

​Model / voice / format enums

​ElevenLabs

​Telephony factories — for_twilio() / for_telnyx()

​WebSocket variant

​OpenAI

​Cartesia

​Rime

​LMNT

​Missing credentials

​What’s Next

STT

LLM

TTS (Text-to-Speech)

Quickstart

Supported providers

Model / voice / format enums

ElevenLabs

Telephony factories — `for_twilio()` / `for_telnyx()`

WebSocket variant

OpenAI

Cartesia

Rime

LMNT

Missing credentials

What’s Next