Engines
An engine is an end-to-end speech-to-speech runtime. Pass an engine instance tophone.agent(engine=...) and Patter wires the audio stream straight through to the provider — no separate STT or TTS is needed.
Patter ships with three engine classes today:
OpenAIRealtime— OpenAI’s Realtime API (v1-beta family,gpt-realtime-mini/gpt-realtime/gpt-4o-*-realtime-preview)OpenAIRealtime2— OpenAI’s GA Realtime API (gpt-realtime-2), separate marker because the GA endpoint speaks a differentsession.updatewire shapeElevenLabsConvAI— ElevenLabs Conversational AI
from getpatter import OpenAIRealtime) and a namespaced class (from getpatter.engines import openai → openai.Realtime()). They are equivalent.
If you need full control over STT, LLM, and TTS independently, use pipeline mode instead and omit engine=.
OpenAIRealtime
OpenAI’s Realtime API — the lowest-latency option.Telephony audio. Over Twilio/Telnyx the
OpenAIRealtime engine routes
through the same GA-compatible adapter as OpenAIRealtime2:
it negotiates PCM-16-LE @ 24 kHz with OpenAI and transcodes to/from the
carrier’s mulaw 8 kHz internally. Current OpenAI Realtime models return PCM16 @
24 kHz regardless of a legacy g711_ulaw request, so Patter standardises on PCM
and converts on the carrier leg — you don’t configure anything.| Parameter | Type | Default | Description |
|---|---|---|---|
api_key | str | "" | OpenAI API key. Reads from OPENAI_API_KEY when empty. |
voice | str | "alloy" | One of "alloy", "ash", "ballad", "coral", "echo", "fable", "nova", "onyx", "sage", "shimmer", "verse". |
model | str | "gpt-realtime-mini" | OpenAI Realtime model ID. See supported models. |
reasoning_effort | "minimal" | "low" | "medium" | "high" | None | None | Reasoning tier for gpt-realtime-2. None leaves the field unset (server default). OpenAI recommends "low" for production voice flows; higher tiers add measurable per-turn latency. No-op on models that ignore it. |
input_audio_transcription_model | str | None | None | Override the Realtime session’s input_audio_transcription.model. None keeps the adapter default ("whisper-1"). Use "gpt-realtime-whisper" for low-latency partials, "gpt-4o-transcribe" for higher accuracy. |
Supported model identifiers
Themodel argument accepts any OpenAI Realtime model ID. Common values:
| Model | Notes |
|---|---|
"gpt-realtime-mini" | Default. Lowest latency / lowest cost. |
"gpt-realtime" | GA realtime model (Aug 2025). |
"gpt-realtime-2" | Most-capable: stronger instruction following, configurable reasoning_effort, 128K context. |
"gpt-4o-realtime-preview" | Earlier preview line; ~10x the per-token cost of mini. |
"gpt-4o-mini-realtime-preview" | Earlier preview line. |
reasoning_effort, transcription model, and the full configuration surface, see OpenAI Realtime — full reference.
Namespaced form:
OpenAIRealtime2
Marker class that selects the GA Realtime API (gpt-realtime-2). The GA endpoint speaks a different session.update wire shape than the v1-beta family (no OpenAI-Beta: realtime=v1 header, session.type: "realtime", nested audio.{input,output} with MIME types, output_modalities instead of modalities), so OpenAIRealtime2 dispatches to a separate adapter (OpenAIRealtime2Adapter).
| Parameter | Type | Default | Description |
|---|---|---|---|
api_key | str | "" | OpenAI API key. Reads from OPENAI_API_KEY when empty. |
voice | str | "alloy" | Same voice set as OpenAIRealtime. |
model | str | "gpt-realtime-2" | Pinned to the GA model. Override only if OpenAI ships future GA-shaped models. |
reasoning_effort | "minimal" | "low" | "medium" | "high" | None | None | gpt-realtime-2 reasoning tier. "low" is OpenAI’s recommendation for production voice flows. |
input_audio_transcription_model | str | None | None | Override for audio.input.transcription.model. None keeps the adapter default ("whisper-1"). |
PCM transport: the GA endpoint accepts only PCM-16-LE at >=24 kHz. Patter transcodes inbound mulaw 8 kHz → PCM 24 kHz and outbound PCM 24 kHz → mulaw 8 kHz transparently on the carrier side; you don’t need to configure anything.
ElevenLabsConvAI
ElevenLabs Conversational AI — premium voice quality using a managed agent configured in the ElevenLabs dashboard.| Parameter | Type | Default | Description |
|---|---|---|---|
api_key | str | "" | ElevenLabs API key. Reads from ELEVENLABS_API_KEY when empty. |
agent_id | str | "" | ElevenLabs agent ID (from the ConvAI dashboard). Reads from ELEVENLABS_AGENT_ID when empty. |
voice | str | "" | Optional override for the agent’s default voice ID. |
What’s Next
LLM
Compare engine mode with pipeline mode.
STT
STT for pipeline mode.
TTS
TTS for pipeline mode.

