Engines
An engine is an end-to-end speech-to-speech runtime. Pass an engine instance tophone.agent({ engine }) and Patter wires the audio stream straight through to the provider — no separate STT or TTS is needed.
Patter ships with three engine classes today:
OpenAIRealtime— OpenAI’s Realtime API (beta endpoint)OpenAIRealtime2— OpenAI’s GA Realtime API (targetsgpt-realtime-2)ElevenLabsConvAI— ElevenLabs Conversational AI
import { OpenAIRealtime, OpenAIRealtime2, ElevenLabsConvAI } from "getpatter".
If you need full control over STT, LLM, and TTS independently, use pipeline mode instead and omit engine.
OpenAIRealtime
OpenAI’s Realtime API — the lowest-latency option.Telephony audio. Over Twilio/Telnyx the
OpenAIRealtime engine routes
through the same GA-compatible adapter as OpenAIRealtime2:
it negotiates PCM-16-LE @ 24 kHz with OpenAI and transcodes to/from the
carrier’s mulaw 8 kHz internally. Current OpenAI Realtime models return PCM16 @
24 kHz regardless of a legacy g711_ulaw request, so Patter standardises on PCM
and converts on the carrier leg — you don’t configure any of this.| Parameter | Type | Default | Description |
|---|---|---|---|
apiKey | string | — | OpenAI API key. Reads from OPENAI_API_KEY when omitted. |
voice | string | "alloy" | One of "alloy", "ash", "ballad", "coral", "echo", "sage", "shimmer", "verse". |
model | string | "gpt-4o-mini-realtime-preview" | OpenAI Realtime model ID. See supported models. |
reasoningEffort | "minimal" | "low" | "medium" | "high" | — | Reasoning-effort tier for gpt-realtime-2. When omitted, the field is not sent and the server default applies. OpenAI recommends "low" for production voice. |
inputAudioTranscriptionModel | string | — | Override for the Realtime session’s input_audio_transcription.model. Omit to keep whisper-1. |
Supported model identifiers
Themodel option accepts any OpenAI Realtime model ID. Common values:
| Model | Notes |
|---|---|
"gpt-4o-mini-realtime-preview" | Engine marker default. Earlier preview line — cheap and low-latency. |
"gpt-realtime-mini" | GA mini — recommended cheap default for new deployments. |
"gpt-realtime" | GA realtime model (Aug 2025). |
"gpt-realtime-2" | Most-capable: stronger instruction following, configurable reasoningEffort, 128K context. Use OpenAIRealtime2 for the GA wire shape. |
"gpt-4o-realtime-preview" | Earlier preview line; ~10x the per-token cost of mini. |
reasoningEffort, transcription model, and the full configuration surface, see OpenAI Realtime — full reference.
OpenAIRealtime2
OpenAI’s GA Realtime API — separate engine marker because the GA endpoint speaks a differentsession.update wire shape (output_modalities, nested audio.{input,output} blocks, session.type = "realtime") and rejects the legacy beta header. Targets gpt-realtime-2 by default and routes through OpenAIRealtime2Adapter, which also handles bidirectional mulaw 8 kHz ↔ PCM 24 kHz transcoding (the GA audio engine silently drops mulaw frames).
| Parameter | Type | Default | Description |
|---|---|---|---|
apiKey | string | — | OpenAI API key. Reads from OPENAI_API_KEY when omitted. |
voice | string | "alloy" | Voice preset. |
model | string | "gpt-realtime-2" | GA Realtime model. |
reasoningEffort | "minimal" | "low" | "medium" | "high" | — | When omitted, the field is not sent and the server default applies. OpenAI recommends "low" for production voice. |
inputAudioTranscriptionModel | string | — | Override for audio.input.transcription.model. Omit to keep whisper-1. |
The GA adapter pins
turn_detection.create_response: false and interrupt_response: false in the session.update payload. Patter owns response creation (response.create) and barge-in cancellation explicitly so the hallucination filter and barge-in pipeline can decide per turn rather than letting the server VAD auto-trigger. See OpenAIRealtime2 — full reference.ElevenLabsConvAI
ElevenLabs Conversational AI — premium voice quality using a managed agent configured in the ElevenLabs dashboard.| Parameter | Type | Default | Description |
|---|---|---|---|
apiKey | string | — | ElevenLabs API key. Reads from ELEVENLABS_API_KEY when omitted. |
agentId | string | — | ElevenLabs agent ID (from the ConvAI dashboard). Reads from ELEVENLABS_AGENT_ID when omitted. |
voice | string | — | Optional override for the agent’s default voice ID. |
What’s Next
LLM
Compare engine mode with pipeline mode.
STT
STT for pipeline mode.
TTS
TTS for pipeline mode.

