Engines

An engine is an end-to-end speech-to-speech runtime. Pass an engine instance to phone.agent({ engine }) and Patter wires the audio stream straight through to the provider — no separate STT or TTS is needed. Patter ships with three engine classes today:

OpenAIRealtime — OpenAI’s Realtime API (beta endpoint)
OpenAIRealtime2 — OpenAI’s GA Realtime API (targets gpt-realtime-2)
ElevenLabsConvAI — ElevenLabs Conversational AI

All three classes are imported by name from the package barrel: import { OpenAIRealtime, OpenAIRealtime2, ElevenLabsConvAI } from "getpatter". If you need full control over STT, LLM, and TTS independently, use pipeline mode instead and omit engine.

OpenAIRealtime

OpenAI’s Realtime API — the lowest-latency option.

// npx tsx example.ts
import { Patter, Twilio, OpenAIRealtime } from "getpatter";

const phone = new Patter({ carrier: new Twilio(), phoneNumber: "+15550001234" });

const agent = phone.agent({
  engine: new OpenAIRealtime({ voice: "alloy" }),           // OPENAI_API_KEY from env
  systemPrompt: "You are a helpful assistant.",
  firstMessage: "Hello!",
});

await phone.serve({ agent });

Telephony audio. Over Twilio/Telnyx the OpenAIRealtime engine routes through the same GA-compatible adapter as OpenAIRealtime2: it negotiates PCM-16-LE @ 24 kHz with OpenAI and transcodes to/from the carrier’s mulaw 8 kHz internally. Current OpenAI Realtime models return PCM16 @ 24 kHz regardless of a legacy g711_ulaw request, so Patter standardises on PCM and converts on the carrier leg — you don’t configure any of this.

Parameter	Type	Default	Description
`apiKey`	`string`	—	OpenAI API key. Reads from `OPENAI_API_KEY` when omitted.
`voice`	`string`	`"alloy"`	One of `"alloy"`, `"ash"`, `"ballad"`, `"coral"`, `"echo"`, `"sage"`, `"shimmer"`, `"verse"`.
`model`	`string`	`"gpt-4o-mini-realtime-preview"`	OpenAI Realtime model ID. See supported models.
`reasoningEffort`	`"minimal" \| "low" \| "medium" \| "high"`	—	Reasoning-effort tier for `gpt-realtime-2`. When omitted, the field is not sent and the server default applies. OpenAI recommends `"low"` for production voice.
`inputAudioTranscriptionModel`	`string`	—	Override for the Realtime session’s `input_audio_transcription.model`. Omit to keep `whisper-1`.

Supported model identifiers

The model option accepts any OpenAI Realtime model ID. Common values:

Model	Notes
`"gpt-4o-mini-realtime-preview"`	Engine marker default. Earlier preview line — cheap and low-latency.
`"gpt-realtime-mini"`	GA mini — recommended cheap default for new deployments.
`"gpt-realtime"`	GA realtime model (Aug 2025).
`"gpt-realtime-2"`	Most-capable: stronger instruction following, configurable `reasoningEffort`, 128K context. Use `OpenAIRealtime2` for the GA wire shape.
`"gpt-4o-realtime-preview"`	Earlier preview line; ~10x the per-token cost of mini.

Pricing is auto-resolved per model — see Metrics. For reasoningEffort, transcription model, and the full configuration surface, see OpenAI Realtime — full reference.

OpenAIRealtime2

OpenAI’s GA Realtime API — separate engine marker because the GA endpoint speaks a different session.update wire shape (output_modalities, nested audio.{input,output} blocks, session.type = "realtime") and rejects the legacy beta header. Targets gpt-realtime-2 by default and routes through OpenAIRealtime2Adapter, which also handles bidirectional mulaw 8 kHz ↔ PCM 24 kHz transcoding (the GA audio engine silently drops mulaw frames).

// npx tsx example.ts
import { Patter, Twilio, OpenAIRealtime2 } from "getpatter";

const phone = new Patter({ carrier: new Twilio(), phoneNumber: "+15550001234" });

const agent = phone.agent({
  engine: new OpenAIRealtime2({ reasoningEffort: "low" }),  // OPENAI_API_KEY from env
  systemPrompt: "You are a helpful assistant.",
  firstMessage: "Hello!",
});

await phone.serve({ agent });

Parameter	Type	Default	Description
`apiKey`	`string`	—	OpenAI API key. Reads from `OPENAI_API_KEY` when omitted.
`voice`	`string`	`"alloy"`	Voice preset.
`model`	`string`	`"gpt-realtime-2"`	GA Realtime model.
`reasoningEffort`	`"minimal" \| "low" \| "medium" \| "high"`	—	When omitted, the field is not sent and the server default applies. OpenAI recommends `"low"` for production voice.
`inputAudioTranscriptionModel`	`string`	—	Override for `audio.input.transcription.model`. Omit to keep `whisper-1`.

The GA adapter pins turn_detection.create_response: false and interrupt_response: false in the session.update payload. Patter owns response creation (response.create) and barge-in cancellation explicitly so the hallucination filter and barge-in pipeline can decide per turn rather than letting the server VAD auto-trigger. See OpenAIRealtime2 — full reference.

ElevenLabsConvAI

ElevenLabs Conversational AI — premium voice quality using a managed agent configured in the ElevenLabs dashboard.

// npx tsx example.ts
import { Patter, Twilio, ElevenLabsConvAI } from "getpatter";

const phone = new Patter({ carrier: new Twilio(), phoneNumber: "+15550001234" });

const agent = phone.agent({
  engine: new ElevenLabsConvAI({ agentId: "agent_abc123" }),   // ELEVENLABS_API_KEY from env
  systemPrompt: "You are a warm and friendly concierge.",
});

await phone.serve({ agent });

Parameter	Type	Default	Description
`apiKey`	`string`	—	ElevenLabs API key. Reads from `ELEVENLABS_API_KEY` when omitted.
`agentId`	`string`	—	ElevenLabs agent ID (from the ConvAI dashboard). Reads from `ELEVENLABS_AGENT_ID` when omitted.
`voice`	`string`	—	Optional override for the agent’s default voice ID.

What’s Next

LLM

Compare engine mode with pipeline mode.

STT

STT for pipeline mode.

TTS

TTS for pipeline mode.

​Engines

​OpenAIRealtime

​Supported model identifiers

​OpenAIRealtime2

​ElevenLabsConvAI

​What’s Next

LLM

STT

TTS

Engines

OpenAIRealtime

Supported model identifiers

OpenAIRealtime2

ElevenLabsConvAI

What’s Next