Voice Providers
Patter supports three voice AI architectures. Each offers different tradeoffs between latency, voice quality, and customization.
OpenAI Realtime (Default)
End-to-end voice processing powered by OpenAI’s Realtime API. Audio goes directly to OpenAI, which handles speech recognition, language understanding, and speech synthesis in a single round trip.
agent = phone.agent(
system_prompt="You are a helpful assistant.",
provider="openai_realtime", # default
model="gpt-4o-mini-realtime-preview",
voice="alloy",
)
Audio Encoding
OpenAI Realtime handles audio encoding automatically based on your telephony provider:
| Telephony Provider | Audio Format | Sample Rate |
|---|
| Twilio | G.711 mu-law | 8 kHz |
| Telnyx | PCM 16-bit | 16 kHz |
Available Voices
"alloy", "echo", "fable", "onyx", "nova", "shimmer"
Requirements
openai_key in the Patter constructor (local mode)
ElevenLabs Conversational AI
Uses ElevenLabs’ Conversational AI platform for natural, expressive voices. Ideal when voice quality is the top priority.
agent = phone.agent(
system_prompt="You are a warm and friendly concierge.",
provider="elevenlabs_convai",
voice="rachel",
)
Configuration
When using ElevenLabs ConvAI, you can configure additional provider-specific parameters through the agent:
| Parameter | Description |
|---|
voice | ElevenLabs voice ID or name (e.g., "rachel", "adam") |
model | Model identifier for ElevenLabs |
Requirements
elevenlabs_key in the Patter constructor (local mode)
Pipeline Mode
Build a custom voice pipeline by combining separate STT (speech-to-text) and TTS (text-to-speech) providers. This gives you full control over each stage of the audio processing chain.
agent = phone.agent(
system_prompt="You are a helpful assistant.",
provider="pipeline",
stt=Patter.deepgram(api_key="dg_..."),
tts=Patter.elevenlabs(api_key="el_...", voice="rachel"),
)
In pipeline mode, the on_message callback receives the transcribed text and returns the response to synthesize:
async def handle_message(event) -> str:
return f"You said: {event['text']}. How can I help?"
await phone.serve(agent, on_message=handle_message)
Requirements
Pipeline mode requires both an STT and a TTS provider. If you don’t pass stt/tts explicitly, Patter falls back to deepgram_key and elevenlabs_key from the constructor.
STT Providers
Use these factory methods to configure speech-to-text:
Patter.deepgram()
stt = Patter.deepgram(api_key="dg_...", language="en")
| Parameter | Type | Default | Description |
|---|
api_key | str | required | Your Deepgram API key. |
language | str | "en" | BCP-47 language code. |
Patter.whisper()
stt = Patter.whisper(api_key="sk-...", language="en")
| Parameter | Type | Default | Description |
|---|
api_key | str | required | Your OpenAI API key. |
language | str | "en" | BCP-47 language code. |
TTS Providers
Use these factory methods to configure text-to-speech:
Patter.elevenlabs()
tts = Patter.elevenlabs(api_key="el_...", voice="rachel")
| Parameter | Type | Default | Description |
|---|
api_key | str | required | Your ElevenLabs API key. |
voice | str | "rachel" | Voice name or ID. |
Patter.openai_tts()
tts = Patter.openai_tts(api_key="sk-...", voice="alloy")
| Parameter | Type | Default | Description |
|---|
api_key | str | required | Your OpenAI API key. |
voice | str | "alloy" | Voice name ("alloy", "echo", "fable", "onyx", "nova", "shimmer"). |
OpenAI TTS returns audio at 24 kHz. Patter automatically resamples it to 16 kHz for telephony compatibility.
Provider Comparison
| Feature | OpenAI Realtime | ElevenLabs ConvAI | Pipeline |
|---|
| Latency | Lowest | Low | Medium |
| Voice quality | Good | Best | Configurable |
| Customization | Limited | Medium | Full |
on_message callback | No | No | Yes |
| Requires AI key | OpenAI | ElevenLabs | STT + TTS keys |
Complete Pipeline Example
import os
import asyncio
from dotenv import load_dotenv
from patter import Patter
load_dotenv()
phone = Patter(
twilio_sid=os.environ["TWILIO_SID"],
twilio_token=os.environ["TWILIO_TOKEN"],
phone_number=os.environ["PHONE_NUMBER"],
webhook_url=os.environ["WEBHOOK_URL"],
)
agent = phone.agent(
system_prompt="You are a helpful assistant.",
provider="pipeline",
stt=Patter.deepgram(api_key=os.environ["DEEPGRAM_KEY"]),
tts=Patter.elevenlabs(api_key=os.environ["ELEVENLABS_KEY"], voice="rachel"),
)
async def handle_message(event) -> str:
user_text = event["text"]
# Add your own LLM logic here
return f"I heard you say: {user_text}"
async def main():
await phone.serve(agent, on_message=handle_message, port=8000)
asyncio.run(main())