Skip to main content

Voice Providers

Patter supports three voice AI architectures. Each offers different tradeoffs between latency, voice quality, and customization.

OpenAI Realtime (Default)

End-to-end voice processing powered by OpenAI’s Realtime API. Audio goes directly to OpenAI, which handles speech recognition, language understanding, and speech synthesis in a single round trip.
agent = phone.agent(
    system_prompt="You are a helpful assistant.",
    provider="openai_realtime",  # default
    model="gpt-4o-mini-realtime-preview",
    voice="alloy",
)

Audio Encoding

OpenAI Realtime handles audio encoding automatically based on your telephony provider:
Telephony ProviderAudio FormatSample Rate
TwilioG.711 mu-law8 kHz
TelnyxPCM 16-bit16 kHz

Available Voices

"alloy", "echo", "fable", "onyx", "nova", "shimmer"

Requirements

  • openai_key in the Patter constructor (local mode)

ElevenLabs Conversational AI

Uses ElevenLabs’ Conversational AI platform for natural, expressive voices. Ideal when voice quality is the top priority.
agent = phone.agent(
    system_prompt="You are a warm and friendly concierge.",
    provider="elevenlabs_convai",
    voice="rachel",
)

Configuration

When using ElevenLabs ConvAI, you can configure additional provider-specific parameters through the agent:
ParameterDescription
voiceElevenLabs voice ID or name (e.g., "rachel", "adam")
modelModel identifier for ElevenLabs

Requirements

  • elevenlabs_key in the Patter constructor (local mode)

Pipeline Mode

Build a custom voice pipeline by combining separate STT (speech-to-text) and TTS (text-to-speech) providers. This gives you full control over each stage of the audio processing chain.
agent = phone.agent(
    system_prompt="You are a helpful assistant.",
    provider="pipeline",
    stt=Patter.deepgram(api_key="dg_..."),
    tts=Patter.elevenlabs(api_key="el_...", voice="rachel"),
)
In pipeline mode, the on_message callback receives the transcribed text and returns the response to synthesize:
async def handle_message(event) -> str:
    return f"You said: {event['text']}. How can I help?"

await phone.serve(agent, on_message=handle_message)

Requirements

Pipeline mode requires both an STT and a TTS provider. If you don’t pass stt/tts explicitly, Patter falls back to deepgram_key and elevenlabs_key from the constructor.

STT Providers

Use these factory methods to configure speech-to-text:

Patter.deepgram()

stt = Patter.deepgram(api_key="dg_...", language="en")
ParameterTypeDefaultDescription
api_keystrrequiredYour Deepgram API key.
languagestr"en"BCP-47 language code.

Patter.whisper()

stt = Patter.whisper(api_key="sk-...", language="en")
ParameterTypeDefaultDescription
api_keystrrequiredYour OpenAI API key.
languagestr"en"BCP-47 language code.

TTS Providers

Use these factory methods to configure text-to-speech:

Patter.elevenlabs()

tts = Patter.elevenlabs(api_key="el_...", voice="rachel")
ParameterTypeDefaultDescription
api_keystrrequiredYour ElevenLabs API key.
voicestr"rachel"Voice name or ID.

Patter.openai_tts()

tts = Patter.openai_tts(api_key="sk-...", voice="alloy")
ParameterTypeDefaultDescription
api_keystrrequiredYour OpenAI API key.
voicestr"alloy"Voice name ("alloy", "echo", "fable", "onyx", "nova", "shimmer").
OpenAI TTS returns audio at 24 kHz. Patter automatically resamples it to 16 kHz for telephony compatibility.

Provider Comparison

FeatureOpenAI RealtimeElevenLabs ConvAIPipeline
LatencyLowestLowMedium
Voice qualityGoodBestConfigurable
CustomizationLimitedMediumFull
on_message callbackNoNoYes
Requires AI keyOpenAIElevenLabsSTT + TTS keys

Complete Pipeline Example

import os
import asyncio
from dotenv import load_dotenv
from patter import Patter

load_dotenv()

phone = Patter(
    twilio_sid=os.environ["TWILIO_SID"],
    twilio_token=os.environ["TWILIO_TOKEN"],
    phone_number=os.environ["PHONE_NUMBER"],
    webhook_url=os.environ["WEBHOOK_URL"],
)

agent = phone.agent(
    system_prompt="You are a helpful assistant.",
    provider="pipeline",
    stt=Patter.deepgram(api_key=os.environ["DEEPGRAM_KEY"]),
    tts=Patter.elevenlabs(api_key=os.environ["ELEVENLABS_KEY"], voice="rachel"),
)

async def handle_message(event) -> str:
    user_text = event["text"]
    # Add your own LLM logic here
    return f"I heard you say: {user_text}"

async def main():
    await phone.serve(agent, on_message=handle_message, port=8000)

asyncio.run(main())