Documentation Index
Fetch the complete documentation index at: https://docs.getpatter.com/llms.txt
Use this file to discover all available pages before exploring further.
OpenAI Realtime
OpenAIRealtime is the engine wrapper for OpenAI’s Realtime API — a single WebSocket session that handles speech-in, reasoning, and speech-out, with sub-500 ms typical turn latency.
For the basic engine: new OpenAIRealtime(...) quickstart, see Engines. This page documents the full configuration surface: every supported model, the streaming transcription options, and the new reasoningEffort tier.
Models
Pass any of these tomodel: on new OpenAIRealtime(...). Pricing is auto-resolved per model from DEFAULT_PRICING — no manual override is required (see Metrics).
| Model | Audio in / out (per M tokens) | Notes |
|---|---|---|
"gpt-realtime-mini" (default) | 20 | Fastest + cheapest. Production default for most voice flows. |
"gpt-realtime" | 64 | GA realtime model (Aug 2025). |
"gpt-realtime-2" | 64 | Most-capable. Stronger instruction following, 128K context, supports reasoningEffort. |
"gpt-4o-realtime-preview" | 200 | Earlier preview, retained for compatibility. |
"gpt-4o-mini-realtime-preview" | 20 | Earlier preview, retained for compatibility. |
gpt-realtime-translate is intentionally not supported by Patter’s Realtime engine. It lives on a different OpenAI endpoint (/v1/realtime/translations), does not accept tool calls or response.create, and would invalidate the Agent contract Patter exposes. Real-time translation, if added, will land as a dedicated feature — not as a Realtime model variant.Reasoning effort
gpt-realtime-2 accepts a configurable reasoning tier. Patter exposes it as the reasoningEffort constructor option on the lower-level OpenAIRealtimeAdapter:
| Value | When to use |
|---|---|
"minimal" | Snappy turn-taking. Skips most reasoning. |
"low" | Recommended for production voice. Good instruction following without measurable per-turn latency. |
"medium" | Multi-step tool flows where the model should plan. Adds latency. |
"high" | Complex reasoning. Not recommended for live phone calls. |
session.reasoning = { effort: ... } into the session.update payload. When omitted, the field is not sent and OpenAI’s server default applies. The field is a no-op on models that ignore it (for example gpt-realtime-mini), so it’s safe to leave configured across model swaps.
Streaming transcription
The Realtime session can run an inline Whisper-family model on inbound audio so you get text deltas alongside the conversation. The model is set viainputAudioTranscriptionModel:
| Model | Cost | Notes |
|---|---|---|
"whisper-1" (default) | $0.006/min | Established Whisper. Slower partials. |
"gpt-4o-mini-transcribe" | $0.003/min | Cheapest. |
"gpt-4o-transcribe" | $0.006/min | Higher accuracy. |
"gpt-realtime-whisper" | $0.017/min | Streaming-optimised. Lowest-latency partials. Use when you need fast deltas in the dashboard or for live captioning. |
Worked example — gpt-realtime-2 with low reasoning + streaming whisper
Constructing the lower-level OpenAIRealtimeAdapter directly gives access to every field. This is what new OpenAIRealtime({ engine }) builds under the hood; reach for it when you need reasoningEffort or a non-default transcription model.
Backward compatibility
- Defaults are unchanged:
model: "gpt-realtime-mini",inputAudioTranscriptionModel: "whisper-1",reasoningEffort: undefined. - All existing
new OpenAIRealtime(...)constructions keep working without code changes. - Pricing for new models is added under
DEFAULT_PRICING.openai_realtime.models[...]. The earliernew Patter({ pricing: { openai_realtime: DEFAULT_PRICING.openai_realtime_2 } })workaround is no longer needed — just construct withmodel: "gpt-realtime-2".
What’s Next
Engines
All engine classes side by side.
Metrics
Per-call cost breakdown and the model-aware pricing table.
Agents
Configure system prompts, tools, and first messages.
Tools
Function calling inside a Realtime session.

