Skip to main content

Engines

An engine is an end-to-end speech-to-speech runtime. Pass an engine instance to phone.agent(engine=...) and Patter wires the audio stream straight through to the provider — no separate STT or TTS is needed. Patter ships with three engine classes today:
  • OpenAIRealtime — OpenAI’s Realtime API (v1-beta family, gpt-realtime-mini / gpt-realtime / gpt-4o-*-realtime-preview)
  • OpenAIRealtime2 — OpenAI’s GA Realtime API (gpt-realtime-2), separate marker because the GA endpoint speaks a different session.update wire shape
  • ElevenLabsConvAI — ElevenLabs Conversational AI
Each class ships as both a flat alias (from getpatter import OpenAIRealtime) and a namespaced class (from getpatter.engines import openaiopenai.Realtime()). They are equivalent. If you need full control over STT, LLM, and TTS independently, use pipeline mode instead and omit engine=.

OpenAIRealtime

OpenAI’s Realtime API — the lowest-latency option.
import asyncio
from getpatter import Patter, Twilio, OpenAIRealtime

phone = Patter(carrier=Twilio(), phone_number="+15550001234")   # TWILIO_* from env

agent = phone.agent(
    engine=OpenAIRealtime(voice="nova"),                        # OPENAI_API_KEY from env
    system_prompt="You are a helpful assistant.",
    first_message="Hello!",
)

async def main():
    await phone.serve(agent)

asyncio.run(main())
Telephony audio. Over Twilio/Telnyx the OpenAIRealtime engine routes through the same GA-compatible adapter as OpenAIRealtime2: it negotiates PCM-16-LE @ 24 kHz with OpenAI and transcodes to/from the carrier’s mulaw 8 kHz internally. Current OpenAI Realtime models return PCM16 @ 24 kHz regardless of a legacy g711_ulaw request, so Patter standardises on PCM and converts on the carrier leg — you don’t configure anything.
ParameterTypeDefaultDescription
api_keystr""OpenAI API key. Reads from OPENAI_API_KEY when empty.
voicestr"alloy"One of "alloy", "ash", "ballad", "coral", "echo", "fable", "nova", "onyx", "sage", "shimmer", "verse".
modelstr"gpt-realtime-mini"OpenAI Realtime model ID. See supported models.
reasoning_effort"minimal" | "low" | "medium" | "high" | NoneNoneReasoning tier for gpt-realtime-2. None leaves the field unset (server default). OpenAI recommends "low" for production voice flows; higher tiers add measurable per-turn latency. No-op on models that ignore it.
input_audio_transcription_modelstr | NoneNoneOverride the Realtime session’s input_audio_transcription.model. None keeps the adapter default ("whisper-1"). Use "gpt-realtime-whisper" for low-latency partials, "gpt-4o-transcribe" for higher accuracy.

Supported model identifiers

The model argument accepts any OpenAI Realtime model ID. Common values:
ModelNotes
"gpt-realtime-mini"Default. Lowest latency / lowest cost.
"gpt-realtime"GA realtime model (Aug 2025).
"gpt-realtime-2"Most-capable: stronger instruction following, configurable reasoning_effort, 128K context.
"gpt-4o-realtime-preview"Earlier preview line; ~10x the per-token cost of mini.
"gpt-4o-mini-realtime-preview"Earlier preview line.
Pricing is auto-resolved per model — see Metrics. For reasoning_effort, transcription model, and the full configuration surface, see OpenAI Realtime — full reference. Namespaced form:
from getpatter.engines import openai as openai_engine

engine = openai_engine.Realtime()                     # reads OPENAI_API_KEY
engine = openai_engine.Realtime(voice="nova", model="gpt-realtime-2")

OpenAIRealtime2

Marker class that selects the GA Realtime API (gpt-realtime-2). The GA endpoint speaks a different session.update wire shape than the v1-beta family (no OpenAI-Beta: realtime=v1 header, session.type: "realtime", nested audio.{input,output} with MIME types, output_modalities instead of modalities), so OpenAIRealtime2 dispatches to a separate adapter (OpenAIRealtime2Adapter).
import asyncio
from getpatter import Patter, Twilio, OpenAIRealtime2

phone = Patter(carrier=Twilio(), phone_number="+15550001234")   # TWILIO_* from env

agent = phone.agent(
    engine=OpenAIRealtime2(reasoning_effort="low"),             # OPENAI_API_KEY from env
    system_prompt="You are a friendly receptionist.",
    first_message="Hello! How can I help?",
)

async def main():
    await phone.serve(agent)

asyncio.run(main())
ParameterTypeDefaultDescription
api_keystr""OpenAI API key. Reads from OPENAI_API_KEY when empty.
voicestr"alloy"Same voice set as OpenAIRealtime.
modelstr"gpt-realtime-2"Pinned to the GA model. Override only if OpenAI ships future GA-shaped models.
reasoning_effort"minimal" | "low" | "medium" | "high" | NoneNonegpt-realtime-2 reasoning tier. "low" is OpenAI’s recommendation for production voice flows.
input_audio_transcription_modelstr | NoneNoneOverride for audio.input.transcription.model. None keeps the adapter default ("whisper-1").
Namespaced form:
from getpatter.engines import openai_realtime_2

engine = openai_realtime_2.Realtime2()
engine = openai_realtime_2.Realtime2(reasoning_effort="low")
PCM transport: the GA endpoint accepts only PCM-16-LE at >=24 kHz. Patter transcodes inbound mulaw 8 kHz → PCM 24 kHz and outbound PCM 24 kHz → mulaw 8 kHz transparently on the carrier side; you don’t need to configure anything.

ElevenLabsConvAI

ElevenLabs Conversational AI — premium voice quality using a managed agent configured in the ElevenLabs dashboard.
import asyncio
from getpatter import Patter, Twilio, ElevenLabsConvAI

phone = Patter(carrier=Twilio(), phone_number="+15550001234")   # TWILIO_* from env

agent = phone.agent(
    engine=ElevenLabsConvAI(agent_id="agent_abc123"),           # ELEVENLABS_API_KEY from env
    system_prompt="You are a warm and friendly concierge.",
)

async def main():
    await phone.serve(agent)

asyncio.run(main())
ParameterTypeDefaultDescription
api_keystr""ElevenLabs API key. Reads from ELEVENLABS_API_KEY when empty.
agent_idstr""ElevenLabs agent ID (from the ConvAI dashboard). Reads from ELEVENLABS_AGENT_ID when empty.
voicestr""Optional override for the agent’s default voice ID.
Namespaced form:
from getpatter.engines import elevenlabs as elevenlabs_engine

engine = elevenlabs_engine.ConvAI()                   # reads env
engine = elevenlabs_engine.ConvAI(agent_id="agent_abc123", voice="rachel")

What’s Next

LLM

Compare engine mode with pipeline mode.

STT

STT for pipeline mode.

TTS

TTS for pipeline mode.