OpenAI Realtime 2

OpenAIRealtime2 is the engine marker for OpenAI’s GA Realtime API (the production endpoint that replaces the beta OpenAI-Beta: realtime=v1 channel). It targets gpt-realtime-2 by default and routes through OpenAIRealtime2Adapter — a dedicated adapter that speaks the GA session.update wire shape and performs bidirectional audio transcoding (mulaw 8 kHz ↔ PCM 24 kHz) required by the GA audio engine. For the legacy beta endpoint and the lower-cost gpt-realtime-mini model, keep using OpenAIRealtime. The two engines coexist — pick OpenAIRealtime2 only when you specifically want the GA endpoint or the gpt-realtime-2 model.

The GA endpoint rejects the legacy OpenAI-Beta: realtime=v1 header and expects output_modalities, nested audio.{input,output} blocks with MIME-type strings, and session.type = "realtime". These wire-shape differences are why GA needs its own adapter — the beta OpenAIRealtimeAdapter cannot reach gpt-realtime-2 reliably.

When to use

Use `OpenAIRealtime2` when…	Stick with `OpenAIRealtime` when…
You want `gpt-realtime-2` — strongest instruction following + 128K context + configurable `reasoningEffort`.	You’re on `gpt-realtime-mini` for cost / latency reasons.
You’re hitting the GA endpoint and the beta channel is being deprecated for your account.	You don’t need the GA wire shape and want to keep the existing adapter path.
You want the bidirectional PCM 24 kHz transcoding handled by the SDK rather than the model silently dropping mulaw frames.	Your audio is already PCM 24 kHz end-to-end and beta works for you.

Quickstart

import { Patter, Twilio, OpenAIRealtime2 } from "getpatter";

const phone = new Patter({
  carrier: new Twilio(),                  // TWILIO_* from env
  phoneNumber: "+15555550100",
});

const agent = phone.agent({
  engine: new OpenAIRealtime2({ reasoningEffort: "low" }),
  systemPrompt: "You are a friendly receptionist.",
  firstMessage: "Hello! How can I help today?",
});

await phone.serve({ agent });

reasoningEffort: "low" is OpenAI’s recommended production tier for live voice — it gives the best instruction following without measurable per-turn latency.

Constructor

import { OpenAIRealtime2, type OpenAIRealtime2Options } from "getpatter";

new OpenAIRealtime2({
  apiKey?: string;                            // reads OPENAI_API_KEY
  voice?: string;                             // default: "alloy"
  model?: string;                             // default: "gpt-realtime-2"
  reasoningEffort?: "minimal" | "low" | "medium" | "high";
  inputAudioTranscriptionModel?: string;      // default: "whisper-1"
});

All fields are optional with safe defaults. apiKey falls back to the OPENAI_API_KEY environment variable.

Reasoning effort

Value	When to use
`"minimal"`	Snappy turn-taking. Skips most reasoning.
`"low"`	Recommended for production voice. Good instruction following without measurable per-turn latency.
`"medium"`	Multi-step tool flows where the model should plan. Adds latency.
`"high"`	Complex reasoning. Not recommended for live phone calls.

When set, Patter injects session.reasoning = { effort: ... } into the GA session.update payload. When omitted, the field is not sent and OpenAI’s server default applies.

Streaming transcription

Set inputAudioTranscriptionModel to override audio.input.transcription.model. The same identifiers as the beta endpoint apply — see the streaming-transcription table on the OpenAI Realtime page for the full list (whisper-1, gpt-4o-mini-transcribe, gpt-4o-transcribe, gpt-realtime-whisper).

Tool-call preambles

gpt-realtime-2 treats preambles as first-class — the model speaks a short action sentence before a tool call when steered to. Set toolCallPreambles: true on phone.agent({ ... }) to prepend Patter’s built-in # Preambles guidance block to the session instructions, so a slow tool no longer leaves the caller in silence:

const agent = phone.agent({
  systemPrompt: "...",
  engine: new OpenAIRealtime2({ reasoningEffort: "low" }),
  toolCallPreambles: true,
});

undefined / false (default) leaves the instructions byte-identical; a string overrides the block verbatim. See Tool-call preambles for the full behaviour and the per-tool sample-phrase nicety.

Audio path

The GA audio engine speaks PCM 24 kHz and silently drops mulaw frames. Patter handles the conversion transparently inside OpenAIRealtime2Adapter:

Inbound (Twilio/Telnyx → model): mulaw 8 kHz → PCM 24 kHz
Outbound (model → Twilio/Telnyx): PCM 24 kHz → mulaw 8 kHz

No caller-side change is required — both Twilio Media Streams (mulaw 8 kHz) and Telnyx Call Control (PCM 16 kHz / mulaw 8 kHz) work out of the box.

Speakerphone noise & false barge-in

On a speakerphone or in a noisy room, mouse clicks, the phone being picked up or set down, and background chatter can be mistaken for the caller speaking — the agent gets cut off mid-sentence. Because turn-taking is server-managed, you tune false barge-ins at the OpenAI VAD layer (no carrier-side change), not with a client gate.

Input noise reduction

const agent = phone.agent({
  engine: new OpenAIRealtime2({ noiseReduction: "far_field" }),
  systemPrompt: "...",
});

noiseReduction enables OpenAI’s native input noise reduction:

Value	When to use
`"far_field"`	Recommended for phone / speakerphone / conference audio. Filters room noise and distance.
`"near_field"`	A handset held close to the mouth.
`undefined` (default)	No reduction — today’s behaviour, field omitted entirely.

The GA adapter nests it under session.audio.input.input_audio_noise_reduction.

Turn-detection tuning

// Raise the server_vad threshold so background noise doesn't trip it…
const agent = phone.agent({
  engine: new OpenAIRealtime2({
    noiseReduction: "far_field",
    turnDetection: { type: "server_vad", threshold: 0.6 },
  }),
  systemPrompt: "...",
});

// …or switch to semantic_vad with eagerness "low" so the model waits for the
// caller to actually finish before treating audio as speech.
const agent = phone.agent({
  engine: new OpenAIRealtime2({
    turnDetection: { type: "semantic_vad", eagerness: "low" },
  }),
  systemPrompt: "...",
});

turnDetection (RealtimeTurnDetection) is a readonly config. Each unset field falls back to the adapter default (server_vad, threshold 0.5, prefixPaddingMs 300, silenceDurationMs 300):

Field	Applies to	Notes
`type`	both	`"server_vad"` (default) or `"semantic_vad"`.
`threshold`	server_vad	0..1; higher rejects more background noise.
`prefixPaddingMs`	server_vad	Padding before detected speech.
`silenceDurationMs`	server_vad	Trailing silence before end-of-turn.
`eagerness`	semantic_vad	`"low"` lets the caller finish (least likely to interrupt), through `"medium"` / `"high"` / `"auto"`.

semantic_vad emits {type, eagerness} only — OpenAI rejects threshold / padding / silence on the semantic detector. Both knobs are also exposed directly on phone.agent({ openaiRealtimeNoiseReduction: ..., realtimeTurnDetection: ... }); an explicit agent() option wins over the engine marker value.

Direct adapter use

OpenAIRealtime2Adapter is exported and may be constructed directly when you need to share connection state across calls or override low-level fields. The constructor signature is positional (inherited from OpenAIRealtimeAdapter):

import { OpenAIRealtime2Adapter } from "getpatter";

const adapter = new OpenAIRealtime2Adapter(
  process.env.OPENAI_API_KEY ?? "",   // apiKey
  "gpt-realtime-2",                   // model
  "nova",                             // voice
  "You are a helpful assistant.",     // instructions
  undefined,                          // tools
  "g711_ulaw",                        // audioFormat — GA adapter emits PCM24
                                      // internally regardless of this value,
                                      // but the positional arg is required.
  {
    reasoningEffort: "low",
    inputAudioTranscriptionModel: "gpt-realtime-whisper",
  },
);

const agent = phone.agent({
  engine: adapter,
  systemPrompt: "...",
  firstMessage: "...",
});

The adapter extends OpenAIRealtimeAdapter and overrides connect(), sendAudio(), receiveEvents(), and sendFirstMessage() for the GA wire shape.

Server-managed turn-taking

By default the GA adapter sets both create_response: true and interrupt_response: true in session.update.turn_detection, so the OpenAI server owns turn-taking end to end: it runs VAD, decides end-of-turn, creates the response as soon as the caller stops speaking, and cancels its own response when the caller barges in. The input transcript (Whisper) is pure observability — it never gates or cancels the reply, so the transcription-model choice has no effect on reply latency. On Patter’s WebSocket transport the client still does the bookkeeping the server cannot do for it: it clears the carrier playout buffer and sends conversation.item.truncate for the offset the caller actually heard (OpenAI auto-truncates only on WebRTC/SIP). It does not send a redundant response.cancel, run a client-side anti-flicker gate, or re-anchor turn metrics. Tune false barge-ins (speakerphone / no-AEC PSTN where the agent’s own audio echoes into the input) with RealtimeTurnDetection — raise threshold or switch to semantic_vad with eagerness: "low" — rather than a client gate. To restore the legacy client-managed path (Patter drives response.create / response.cancel and runs its own barge-in gate), set gateResponseOnTranscript: true on the engine marker or realtimeGateResponseOnTranscript: true on phone.agent(...). That emits create_response: false + interrupt_response: false and re-gates the reply on the transcript arriving — the escape hatch for no-AEC self-interruption scenarios.

Backward compatibility

Existing new OpenAIRealtime({...}) callers are unaffected. The legacy engine continues to target the beta endpoint with gpt-realtime-mini as the default.
OpenAIRealtime2 ships as an additive engine — no migration required. Pick it when you want the GA endpoint; otherwise stay where you are.
Pricing for gpt-realtime-2 is auto-resolved per model from DEFAULT_PRICING.openai_realtime.models["gpt-realtime-2"] — see Metrics.

What’s Next

OpenAI Realtime (beta)

The legacy engine for gpt-realtime-mini and earlier preview models.

Engines

All engine classes side by side.

Agents

Configure system prompts, tools, and first messages.

Tools

Function calling inside a Realtime session.

​OpenAI Realtime 2

​When to use

​Quickstart

​Constructor

​Reasoning effort

​Streaming transcription

​Tool-call preambles

​Audio path

​Speakerphone noise & false barge-in

​Input noise reduction

​Turn-detection tuning

​Direct adapter use

​Server-managed turn-taking

​Backward compatibility

​What’s Next

OpenAI Realtime (beta)

Engines

Agents

Tools

OpenAI Realtime 2

When to use

Quickstart

Constructor

Reasoning effort

Streaming transcription

Tool-call preambles

Audio path

Speakerphone noise & false barge-in

Input noise reduction

Turn-detection tuning

Direct adapter use

Server-managed turn-taking

Backward compatibility

What’s Next