Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.getpatter.com/llms.txt

Use this file to discover all available pages before exploring further.

OpenAI Realtime

OpenAIRealtime is the engine wrapper for OpenAI’s Realtime API — a single WebSocket session that handles speech-in, reasoning, and speech-out, with sub-500 ms typical turn latency. For the basic engine: new OpenAIRealtime(...) quickstart, see Engines. This page documents the full configuration surface: every supported model, the streaming transcription options, and the new reasoningEffort tier.

Models

Pass any of these to model: on new OpenAIRealtime(...). Pricing is auto-resolved per model from DEFAULT_PRICING — no manual override is required (see Metrics).
ModelAudio in / out (per M tokens)Notes
"gpt-realtime-mini" (default)10/10 / 20Fastest + cheapest. Production default for most voice flows.
"gpt-realtime"32/32 / 64GA realtime model (Aug 2025).
"gpt-realtime-2"32/32 / 64Most-capable. Stronger instruction following, 128K context, supports reasoningEffort.
"gpt-4o-realtime-preview"100/100 / 200Earlier preview, retained for compatibility.
"gpt-4o-mini-realtime-preview"10/10 / 20Earlier preview, retained for compatibility.
The same identifiers are exposed as a const object for editor autocomplete:
import { OpenAIRealtimeModel } from "getpatter";

OpenAIRealtimeModel.GPT_REALTIME_2;  // "gpt-realtime-2"
gpt-realtime-translate is intentionally not supported by Patter’s Realtime engine. It lives on a different OpenAI endpoint (/v1/realtime/translations), does not accept tool calls or response.create, and would invalidate the Agent contract Patter exposes. Real-time translation, if added, will land as a dedicated feature — not as a Realtime model variant.

Reasoning effort

gpt-realtime-2 accepts a configurable reasoning tier. Patter exposes it as the reasoningEffort constructor option on the lower-level OpenAIRealtimeAdapter:
ValueWhen to use
"minimal"Snappy turn-taking. Skips most reasoning.
"low"Recommended for production voice. Good instruction following without measurable per-turn latency.
"medium"Multi-step tool flows where the model should plan. Adds latency.
"high"Complex reasoning. Not recommended for live phone calls.
When set, Patter injects session.reasoning = { effort: ... } into the session.update payload. When omitted, the field is not sent and OpenAI’s server default applies. The field is a no-op on models that ignore it (for example gpt-realtime-mini), so it’s safe to leave configured across model swaps.
Higher reasoning tiers add measurable latency to every turn. Stick to "low" unless you’ve profiled the call and confirmed the model needs more.

Streaming transcription

The Realtime session can run an inline Whisper-family model on inbound audio so you get text deltas alongside the conversation. The model is set via inputAudioTranscriptionModel:
ModelCostNotes
"whisper-1" (default)$0.006/minEstablished Whisper. Slower partials.
"gpt-4o-mini-transcribe"$0.003/minCheapest.
"gpt-4o-transcribe"$0.006/minHigher accuracy.
"gpt-realtime-whisper"$0.017/minStreaming-optimised. Lowest-latency partials. Use when you need fast deltas in the dashboard or for live captioning.
Same const-object form:
import { OpenAITranscriptionModel } from "getpatter";

OpenAITranscriptionModel.GPT_REALTIME_WHISPER;  // "gpt-realtime-whisper"

Worked example — gpt-realtime-2 with low reasoning + streaming whisper

Constructing the lower-level OpenAIRealtimeAdapter directly gives access to every field. This is what new OpenAIRealtime({ engine }) builds under the hood; reach for it when you need reasoningEffort or a non-default transcription model.
// npx tsx example.ts
import {
  Patter,
  Twilio,
  OpenAIRealtimeAdapter,
  OpenAIRealtimeModel,
  OpenAITranscriptionModel,
} from "getpatter";

const phone = new Patter({
  carrier: new Twilio(),
  phoneNumber: "+15555550100",
});

const adapter = new OpenAIRealtimeAdapter(
  process.env.OPENAI_API_KEY ?? "",
  OpenAIRealtimeModel.GPT_REALTIME_2,
  "nova",
  "You are a helpful, concise voice assistant.",
  undefined,                            // tools
  "g711_ulaw",                          // audioFormat
  {
    inputAudioTranscriptionModel: OpenAITranscriptionModel.GPT_REALTIME_WHISPER,
    reasoningEffort: "low",             // OpenAI's recommended production tier
  },
);

const agent = phone.agent({
  engine: adapter,                      // adapter passed as engine
  systemPrompt: "You are a helpful assistant.",
  firstMessage: "Hi, how can I help today?",
});

await phone.serve({ agent });
The reasoningEffort and inputAudioTranscriptionModel options live on OpenAIRealtimeAdapter. The shorthand new OpenAIRealtime({ model }) engine wrapper currently exposes only apiKey, voice, and model — use the adapter directly when you need the new fields.

Backward compatibility

  • Defaults are unchanged: model: "gpt-realtime-mini", inputAudioTranscriptionModel: "whisper-1", reasoningEffort: undefined.
  • All existing new OpenAIRealtime(...) constructions keep working without code changes.
  • Pricing for new models is added under DEFAULT_PRICING.openai_realtime.models[...]. The earlier new Patter({ pricing: { openai_realtime: DEFAULT_PRICING.openai_realtime_2 } }) workaround is no longer needed — just construct with model: "gpt-realtime-2".

What’s Next

Engines

All engine classes side by side.

Metrics

Per-call cost breakdown and the model-aware pricing table.

Agents

Configure system prompts, tools, and first messages.

Tools

Function calling inside a Realtime session.