Engines

An engine is an end-to-end speech-to-speech runtime. Pass an engine instance to phone.agent(engine=...) and Patter wires the audio stream straight through to the provider — no separate STT or TTS is needed. Patter ships with three engine classes today:

OpenAIRealtime — OpenAI’s Realtime API (v1-beta family, gpt-realtime-mini / gpt-realtime / gpt-4o-*-realtime-preview)
OpenAIRealtime2 — OpenAI’s GA Realtime API (gpt-realtime-2), separate marker because the GA endpoint speaks a different session.update wire shape
ElevenLabsConvAI — ElevenLabs Conversational AI

Each class ships as both a flat alias (from getpatter import OpenAIRealtime) and a namespaced class (from getpatter.engines import openai → openai.Realtime()). They are equivalent. If you need full control over STT, LLM, and TTS independently, use pipeline mode instead and omit engine=.

OpenAIRealtime

OpenAI’s Realtime API — the lowest-latency option.

import asyncio
from getpatter import Patter, Twilio, OpenAIRealtime

phone = Patter(carrier=Twilio(), phone_number="+15550001234")   # TWILIO_* from env

agent = phone.agent(
    engine=OpenAIRealtime(voice="nova"),                        # OPENAI_API_KEY from env
    system_prompt="You are a helpful assistant.",
    first_message="Hello!",
)

async def main():
    await phone.serve(agent)

asyncio.run(main())

Telephony audio. Over Twilio/Telnyx the OpenAIRealtime engine routes through the same GA-compatible adapter as OpenAIRealtime2: it negotiates PCM-16-LE @ 24 kHz with OpenAI and transcodes to/from the carrier’s mulaw 8 kHz internally. Current OpenAI Realtime models return PCM16 @ 24 kHz regardless of a legacy g711_ulaw request, so Patter standardises on PCM and converts on the carrier leg — you don’t configure anything.

Parameter	Type	Default	Description
`api_key`	`str`	`""`	OpenAI API key. Reads from `OPENAI_API_KEY` when empty.
`voice`	`str`	`"alloy"`	One of `"alloy"`, `"ash"`, `"ballad"`, `"coral"`, `"echo"`, `"fable"`, `"nova"`, `"onyx"`, `"sage"`, `"shimmer"`, `"verse"`.
`model`	`str`	`"gpt-realtime-mini"`	OpenAI Realtime model ID. See supported models.
`reasoning_effort`	`"minimal" \| "low" \| "medium" \| "high" \| None`	`None`	Reasoning tier for `gpt-realtime-2`. `None` leaves the field unset (server default). OpenAI recommends `"low"` for production voice flows; higher tiers add measurable per-turn latency. No-op on models that ignore it.
`input_audio_transcription_model`	`str \| None`	`None`	Override the Realtime session’s `input_audio_transcription.model`. `None` keeps the adapter default (`"whisper-1"`). Use `"gpt-realtime-whisper"` for low-latency partials, `"gpt-4o-transcribe"` for higher accuracy.

Supported model identifiers

The model argument accepts any OpenAI Realtime model ID. Common values:

Model	Notes
`"gpt-realtime-mini"`	Default. Lowest latency / lowest cost.
`"gpt-realtime"`	GA realtime model (Aug 2025).
`"gpt-realtime-2"`	Most-capable: stronger instruction following, configurable `reasoning_effort`, 128K context.
`"gpt-4o-realtime-preview"`	Earlier preview line; ~10x the per-token cost of mini.
`"gpt-4o-mini-realtime-preview"`	Earlier preview line.

Pricing is auto-resolved per model — see Metrics. For reasoning_effort, transcription model, and the full configuration surface, see OpenAI Realtime — full reference. Namespaced form:

from getpatter.engines import openai as openai_engine

engine = openai_engine.Realtime()                     # reads OPENAI_API_KEY
engine = openai_engine.Realtime(voice="nova", model="gpt-realtime-2")

OpenAIRealtime2

Marker class that selects the GA Realtime API (gpt-realtime-2). The GA endpoint speaks a different session.update wire shape than the v1-beta family (no OpenAI-Beta: realtime=v1 header, session.type: "realtime", nested audio.{input,output} with MIME types, output_modalities instead of modalities), so OpenAIRealtime2 dispatches to a separate adapter (OpenAIRealtime2Adapter).

import asyncio
from getpatter import Patter, Twilio, OpenAIRealtime2

phone = Patter(carrier=Twilio(), phone_number="+15550001234")   # TWILIO_* from env

agent = phone.agent(
    engine=OpenAIRealtime2(reasoning_effort="low"),             # OPENAI_API_KEY from env
    system_prompt="You are a friendly receptionist.",
    first_message="Hello! How can I help?",
)

async def main():
    await phone.serve(agent)

asyncio.run(main())

Parameter	Type	Default	Description
`api_key`	`str`	`""`	OpenAI API key. Reads from `OPENAI_API_KEY` when empty.
`voice`	`str`	`"alloy"`	Same voice set as `OpenAIRealtime`.
`model`	`str`	`"gpt-realtime-2"`	Pinned to the GA model. Override only if OpenAI ships future GA-shaped models.
`reasoning_effort`	`"minimal" \| "low" \| "medium" \| "high" \| None`	`None`	`gpt-realtime-2` reasoning tier. `"low"` is OpenAI’s recommendation for production voice flows.
`input_audio_transcription_model`	`str \| None`	`None`	Override for `audio.input.transcription.model`. `None` keeps the adapter default (`"whisper-1"`).

Namespaced form:

from getpatter.engines import openai_realtime_2

engine = openai_realtime_2.Realtime2()
engine = openai_realtime_2.Realtime2(reasoning_effort="low")

PCM transport: the GA endpoint accepts only PCM-16-LE at >=24 kHz. Patter transcodes inbound mulaw 8 kHz → PCM 24 kHz and outbound PCM 24 kHz → mulaw 8 kHz transparently on the carrier side; you don’t need to configure anything.

ElevenLabsConvAI

ElevenLabs Conversational AI — premium voice quality using a managed agent configured in the ElevenLabs dashboard.

import asyncio
from getpatter import Patter, Twilio, ElevenLabsConvAI

phone = Patter(carrier=Twilio(), phone_number="+15550001234")   # TWILIO_* from env

agent = phone.agent(
    engine=ElevenLabsConvAI(agent_id="agent_abc123"),           # ELEVENLABS_API_KEY from env
    system_prompt="You are a warm and friendly concierge.",
)

async def main():
    await phone.serve(agent)

asyncio.run(main())

Parameter	Type	Default	Description
`api_key`	`str`	`""`	ElevenLabs API key. Reads from `ELEVENLABS_API_KEY` when empty.
`agent_id`	`str`	`""`	ElevenLabs agent ID (from the ConvAI dashboard). Reads from `ELEVENLABS_AGENT_ID` when empty.
`voice`	`str`	`""`	Optional override for the agent’s default voice ID.

Namespaced form:

from getpatter.engines import elevenlabs as elevenlabs_engine

engine = elevenlabs_engine.ConvAI()                   # reads env
engine = elevenlabs_engine.ConvAI(agent_id="agent_abc123", voice="rachel")

What’s Next

LLM

Compare engine mode with pipeline mode.

STT

STT for pipeline mode.

TTS

TTS for pipeline mode.

​Engines

​OpenAIRealtime

​Supported model identifiers

​OpenAIRealtime2

​ElevenLabsConvAI

​What’s Next

LLM

STT

TTS

Engines

OpenAIRealtime

Supported model identifiers

OpenAIRealtime2

ElevenLabsConvAI

What’s Next