Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.getpatter.com/llms.txt

Use this file to discover all available pages before exploring further.

Ultravox Realtime

UltravoxRealtimeAdapter bridges a bidirectional audio stream to an Ultravox managed-agent call. The transport is plain WebSocket plus one REST call to create the session — no vendor SDK is required. It speaks the same connect / sendAudio / onEvent / close surface as OpenAIRealtimeAdapter, so you can swap engines without touching the call handler.

Install

npm install getpatter
The TypeScript adapter has no extra peer dependency — it uses the bundled ws package. Set ULTRAVOX_API_KEY in your environment.

Constructor

import { UltravoxRealtimeAdapter } from "getpatter";

const adapter = new UltravoxRealtimeAdapter(process.env.ULTRAVOX_API_KEY!, {
  model: "fixie-ai/ultravox",                  // default
  voice: "",                                   // ultravox voice id (optional)
  instructions: "You are a helpful, concise voice assistant.",
  language: "en",
  sampleRate: 16000,                           // 8k / 16k / 24k / 48k
  firstMessage: "",                            // if set, agent speaks first
});

Usage

Pass the adapter as the engine on phone.agent({...}):
import { Patter, Twilio, UltravoxRealtimeAdapter } from "getpatter";

const phone = new Patter({ carrier: new Twilio(), phoneNumber: "+15550001234" });

const agent = phone.agent({
  engine: new UltravoxRealtimeAdapter(process.env.ULTRAVOX_API_KEY!),
  systemPrompt: "You are a helpful assistant.",
  firstMessage: "Hi! How can I help today?",
});

await phone.serve({ agent });
Tools work via phone.agent({ tools: [...] }). Patter translates the OpenAI-style JSON schema into Ultravox dynamicParameters automatically.

How it works

  1. The adapter POSTs to https://api.ultravox.ai/api/calls to create a call. The response includes a single-use joinUrl.
  2. The adapter opens that URL as a WebSocket. From here:
    • Binary frames are PCM16 mono audio (both directions).
    • Text frames are JSON control events (transcripts, tool invocations, state).
firstSpeaker and initialMessages are mutually exclusive on the Ultravox API. When firstMessage is set, Patter sends an initialMessages agent turn; otherwise the user speaks first.

Sample rates

sampleRate accepts 8000, 16000, 24000, or 48000 Hz. Default is 16000 — what Patter’s pipeline-mode audio bus uses internally.

When to use Ultravox vs alternatives

Use Ultravox when…Use OpenAI Realtime when…Use Gemini Live when…
You want a managed agent with its own voice library and no vendor SDK in your dependency tree.You need the broadest tool-calling ecosystem and reasoning tiers.You want native-audio Gemini voices and 1M+ context.

Notes

  • The adapter implements cancelResponse() by sending playback_clear_buffer, which interrupts the agent’s current turn for clean barge-in.
  • The TS adapter uses ws plus the platform fetch; the Python adapter uses aiohttp.
  • Register handlers via adapter.onEvent((type, data) => ...). Event types: audio, transcript_input, transcript_output, function_call, speech_started, response_done, error.

What’s Next

Engines

All engines side by side.

OpenAI Realtime

The default engine.

Gemini Live

Google’s native-audio realtime API.

Tools

Function calling inside a realtime session.