Ultravox Realtime

UltravoxRealtimeAdapter bridges a bidirectional audio stream to an Ultravox managed-agent call. The transport is plain WebSocket plus one REST call to create the session — no vendor SDK is required. It speaks the same connect / send_audio / receive_events / close surface as OpenAIRealtimeAdapter, so you can swap engines without touching the call handler.

Install

pip install "getpatter[ultravox]"

The TypeScript adapter has no extra peer dependency — it uses the bundled ws package. Set ULTRAVOX_API_KEY in your environment.

Constructor

from getpatter.providers.ultravox_realtime import (
    UltravoxRealtimeAdapter,
    UltravoxModel,
    UltravoxSampleRate,
)

adapter = UltravoxRealtimeAdapter(
    api_key="",                                   # ULTRAVOX_API_KEY when wired
    model=UltravoxModel.FIXIE_AI_ULTRAVOX,        # default
    voice="",                                     # ultravox voice id (optional)
    instructions="You are a helpful, concise voice assistant.",
    language="en",
    sample_rate=UltravoxSampleRate.HZ_16000,      # 8k / 16k / 24k / 48k
    first_message="",                             # if set, agent speaks first
)

Usage

Pass the adapter as the engine on phone.agent(...):

import asyncio
from getpatter import Patter, Twilio
from getpatter.providers.ultravox_realtime import UltravoxRealtimeAdapter

phone = Patter(carrier=Twilio(), phone_number="+15550001234")

agent = phone.agent(
    engine=UltravoxRealtimeAdapter(api_key=""),
    system_prompt="You are a helpful assistant.",
    first_message="Hi! How can I help today?",
)

asyncio.run(phone.serve(agent))

Tools work via phone.agent(tools=[Tool(...)]) / tools: [...]. Patter translates the OpenAI-style JSON schema into Ultravox dynamicParameters automatically.

How it works

The adapter POSTs to https://api.ultravox.ai/api/calls to create a call. The response includes a single-use joinUrl.
The adapter opens that URL as a WebSocket. From here:
- Binary frames are PCM16 mono audio (both directions).
- Text frames are JSON control events (transcripts, tool invocations, state).

firstSpeaker and initialMessages are mutually exclusive on the Ultravox API. When first_message / firstMessage is set, Patter sends an initialMessages agent turn; otherwise the user speaks first.

Sample rates

UltravoxSampleRate accepts 8000, 16000, 24000, or 48000 Hz. Default is 16000 — what Patter’s pipeline-mode audio bus uses internally.

When to use Ultravox vs alternatives

Use Ultravox when…	Use OpenAI Realtime when…	Use Gemini Live when…
You want a managed agent with its own voice library and no vendor SDK in your dependency tree.	You need the broadest tool-calling ecosystem and reasoning tiers.	You want native-audio Gemini voices and 1M+ context.

Notes

The adapter implements cancel_response() / cancelResponse() by sending playback_clear_buffer, which interrupts the agent’s current turn for clean barge-in.
The Python adapter uses aiohttp; the TS adapter uses ws plus the platform fetch.
The receive loop yields ("audio", bytes) for agent audio, ("transcript_input", str) / ("transcript_output", str) for transcripts, and ("function_call", {...}) for tool calls.

What’s Next

Engines

All engines side by side.

OpenAI Realtime

The default engine.

Gemini Live

Google’s native-audio realtime API.

Tools

Function calling inside a realtime session.

Get Started

Setting up Patter

Observability

Integrations

Development

Ultravox Realtime

Ultravox Realtime

Install

Constructor

Usage

How it works

Sample rates

When to use Ultravox vs alternatives

Notes

What’s Next

Engines

OpenAI Realtime

Gemini Live

Tools

Get Started

Setting up Patter

Observability

Integrations

Development

Documentation Index

​Ultravox Realtime

​Install

​Constructor

​Usage

​How it works

​Sample rates

​When to use Ultravox vs alternatives

​Notes

​What’s Next

Engines

OpenAI Realtime

Gemini Live

Tools

Ultravox Realtime

Install

Constructor

Usage

How it works

Sample rates

When to use Ultravox vs alternatives

Notes

What’s Next