Skip to main content

Engines

An engine is an end-to-end speech-to-speech runtime. Pass an engine instance to phone.agent({ engine }) and Patter wires the audio stream straight through to the provider — no separate STT or TTS is needed. Patter ships with three engine classes today: All three classes are imported by name from the package barrel: import { OpenAIRealtime, OpenAIRealtime2, ElevenLabsConvAI } from "getpatter". If you need full control over STT, LLM, and TTS independently, use pipeline mode instead and omit engine.

OpenAIRealtime

OpenAI’s Realtime API — the lowest-latency option.
// npx tsx example.ts
import { Patter, Twilio, OpenAIRealtime } from "getpatter";

const phone = new Patter({ carrier: new Twilio(), phoneNumber: "+15550001234" });

const agent = phone.agent({
  engine: new OpenAIRealtime({ voice: "alloy" }),           // OPENAI_API_KEY from env
  systemPrompt: "You are a helpful assistant.",
  firstMessage: "Hello!",
});

await phone.serve({ agent });
Telephony audio. Over Twilio/Telnyx the OpenAIRealtime engine routes through the same GA-compatible adapter as OpenAIRealtime2: it negotiates PCM-16-LE @ 24 kHz with OpenAI and transcodes to/from the carrier’s mulaw 8 kHz internally. Current OpenAI Realtime models return PCM16 @ 24 kHz regardless of a legacy g711_ulaw request, so Patter standardises on PCM and converts on the carrier leg — you don’t configure any of this.
ParameterTypeDefaultDescription
apiKeystringOpenAI API key. Reads from OPENAI_API_KEY when omitted.
voicestring"alloy"One of "alloy", "ash", "ballad", "coral", "echo", "sage", "shimmer", "verse".
modelstring"gpt-4o-mini-realtime-preview"OpenAI Realtime model ID. See supported models.
reasoningEffort"minimal" | "low" | "medium" | "high"Reasoning-effort tier for gpt-realtime-2. When omitted, the field is not sent and the server default applies. OpenAI recommends "low" for production voice.
inputAudioTranscriptionModelstringOverride for the Realtime session’s input_audio_transcription.model. Omit to keep whisper-1.

Supported model identifiers

The model option accepts any OpenAI Realtime model ID. Common values:
ModelNotes
"gpt-4o-mini-realtime-preview"Engine marker default. Earlier preview line — cheap and low-latency.
"gpt-realtime-mini"GA mini — recommended cheap default for new deployments.
"gpt-realtime"GA realtime model (Aug 2025).
"gpt-realtime-2"Most-capable: stronger instruction following, configurable reasoningEffort, 128K context. Use OpenAIRealtime2 for the GA wire shape.
"gpt-4o-realtime-preview"Earlier preview line; ~10x the per-token cost of mini.
Pricing is auto-resolved per model — see Metrics. For reasoningEffort, transcription model, and the full configuration surface, see OpenAI Realtime — full reference.

OpenAIRealtime2

OpenAI’s GA Realtime API — separate engine marker because the GA endpoint speaks a different session.update wire shape (output_modalities, nested audio.{input,output} blocks, session.type = "realtime") and rejects the legacy beta header. Targets gpt-realtime-2 by default and routes through OpenAIRealtime2Adapter, which also handles bidirectional mulaw 8 kHz ↔ PCM 24 kHz transcoding (the GA audio engine silently drops mulaw frames).
// npx tsx example.ts
import { Patter, Twilio, OpenAIRealtime2 } from "getpatter";

const phone = new Patter({ carrier: new Twilio(), phoneNumber: "+15550001234" });

const agent = phone.agent({
  engine: new OpenAIRealtime2({ reasoningEffort: "low" }),  // OPENAI_API_KEY from env
  systemPrompt: "You are a helpful assistant.",
  firstMessage: "Hello!",
});

await phone.serve({ agent });
ParameterTypeDefaultDescription
apiKeystringOpenAI API key. Reads from OPENAI_API_KEY when omitted.
voicestring"alloy"Voice preset.
modelstring"gpt-realtime-2"GA Realtime model.
reasoningEffort"minimal" | "low" | "medium" | "high"When omitted, the field is not sent and the server default applies. OpenAI recommends "low" for production voice.
inputAudioTranscriptionModelstringOverride for audio.input.transcription.model. Omit to keep whisper-1.
The GA adapter pins turn_detection.create_response: false and interrupt_response: false in the session.update payload. Patter owns response creation (response.create) and barge-in cancellation explicitly so the hallucination filter and barge-in pipeline can decide per turn rather than letting the server VAD auto-trigger. See OpenAIRealtime2 — full reference.

ElevenLabsConvAI

ElevenLabs Conversational AI — premium voice quality using a managed agent configured in the ElevenLabs dashboard.
// npx tsx example.ts
import { Patter, Twilio, ElevenLabsConvAI } from "getpatter";

const phone = new Patter({ carrier: new Twilio(), phoneNumber: "+15550001234" });

const agent = phone.agent({
  engine: new ElevenLabsConvAI({ agentId: "agent_abc123" }),   // ELEVENLABS_API_KEY from env
  systemPrompt: "You are a warm and friendly concierge.",
});

await phone.serve({ agent });
ParameterTypeDefaultDescription
apiKeystringElevenLabs API key. Reads from ELEVENLABS_API_KEY when omitted.
agentIdstringElevenLabs agent ID (from the ConvAI dashboard). Reads from ELEVENLABS_AGENT_ID when omitted.
voicestringOptional override for the agent’s default voice ID.

What’s Next

LLM

Compare engine mode with pipeline mode.

STT

STT for pipeline mode.

TTS

TTS for pipeline mode.