Agent Configuration

An Agent defines how your voice AI behaves: what it says, how it sounds, what tools it can use, and what guardrails it follows.

Creating an Agent

Use the phone.agent() factory method. The simplest form leans on env-var fallback and a default engine (OpenAIRealtime):

from getpatter import Patter, Twilio

phone = Patter(carrier=Twilio(), phone_number="+15550001234")   # TWILIO_* from env

agent = phone.agent(
    system_prompt="You are a customer support agent for Acme Corp.",
    first_message="Hello! How can I help you today?",
)   # defaults to engine=OpenAIRealtime(), reads OPENAI_API_KEY

To pick the engine explicitly (flat imports):

from getpatter import OpenAIRealtime

agent = phone.agent(
    engine=OpenAIRealtime(voice="nova"),
    system_prompt="You are a customer support agent for Acme Corp.",
)

To use pipeline mode (pick STT, LLM, TTS independently):

from getpatter import DeepgramSTT, AnthropicLLM, ElevenLabsTTS

agent = phone.agent(
    stt=DeepgramSTT(endpointing_ms=80),        # DEEPGRAM_API_KEY from env
    llm=AnthropicLLM(),                        # ANTHROPIC_API_KEY from env
    tts=ElevenLabsTTS(voice_id="rachel"),      # ELEVENLABS_API_KEY from env
    system_prompt="You are a helpful assistant.",
    first_message="Hi!",
)

Available LLM providers: OpenAILLM, AnthropicLLM, GroqLLM, CerebrasLLM, GoogleLLM. Tool calling works across all five. See LLM for the full reference. For fully custom logic (multi-model routing, local models), drop llm= and pass an on_message callback to serve() instead — llm= and on_message are mutually exclusive. The same pipeline using namespaced imports:

from getpatter.stt import deepgram
from getpatter.llm import anthropic
from getpatter.tts import elevenlabs

agent = phone.agent(
    stt=deepgram.STT(endpointing_ms=80),
    llm=anthropic.LLM(),
    tts=elevenlabs.TTS(voice_id="rachel"),
    system_prompt="You are a helpful assistant.",
)

Agent Parameters

Parameter	Type	Default	Description
`system_prompt`	`str`	required	Instructions that define the agent’s behavior.
`engine`	`OpenAIRealtime \| OpenAIRealtime2 \| ElevenLabsConvAI \| None`	`None` → OpenAI Realtime	End-to-end engine. See Engines. Omit for pipeline mode.
`stt`	`STTProvider \| None`	`None`	STT instance for pipeline mode (`DeepgramSTT()`, `CartesiaSTT()`, …). See STT.
`llm`	`LLMProvider \| None`	`None`	LLM instance for pipeline mode (`AnthropicLLM()`, `GroqLLM()`, …). Mutually exclusive with `on_message` on `serve()`. Ignored when `engine` is set. See LLM.
`tts`	`TTSProvider \| None`	`None`	TTS instance for pipeline mode (`ElevenLabsTTS()`, `RimeTTS()`, …). See TTS.
`voice`	`str`	`"alloy"`	Voice name. Usually inferred from the engine or TTS instance.
`model`	`str`	`"gpt-realtime-mini"`	Model ID for OpenAI Realtime. Usually inferred from the engine.
`language`	`str`	`"en"`	BCP-47 language code.
`first_message`	`str`	`""`	If set, the agent speaks this immediately when a call connects.
`tools`	`list[Tool] \| None`	`None`	`Tool(...)` instances for function calling. See Tools.
`variables`	`dict \| None`	`None`	Dynamic variable substitutions for `{placeholder}` patterns in the system prompt. Values limited to 500 chars.
`guardrails`	`list[Guardrail] \| None`	`None`	`Guardrail(...)` instances applied to LLM output. See Guardrails.
`hooks`	`PipelineHooks \| None`	`None`	Pipeline hooks for intercepting STT/TTS processing. Pipeline mode only. See Events.
`text_transforms`	`list[Callable] \| None`	`None`	Text transformation functions applied to LLM output before TTS. Pipeline mode only.
`vad`	`VADProvider \| None`	`None`	Voice activity detection provider (e.g. Silero). Pipeline mode only.
`audio_filter`	`AudioFilter \| None`	`None`	Pre-STT audio filter (e.g. Krisp noise suppression). Pipeline mode only.
`background_audio`	`BackgroundAudioPlayer \| None`	`None`	Hold music / ambient-cue mixer. Pipeline mode only.
`barge_in_threshold_ms`	`int`	`300`	Sustained-voice window (ms) before treating caller audio as barge-in. Set to `0` to disable.
`aggressive_first_flush`	`bool`	`False`	Opt-in low-latency mode: emits the first clause on a soft punctuation boundary (`,`, em-dash, en-dash) once the buffer reaches ~40 chars. Saves 200–500 ms TTFA on the first sentence at the cost of slightly clipped prosody. Hard-disabled when `language` starts with `"it"` (Italian decimal commas would split mid-number). Pipeline mode only.
`disable_phone_preamble`	`bool`	`False`	When `False` (default), Patter prepends a phone-friendly preamble to `system_prompt` that instructs the LLM to avoid markdown, emojis, bullet lists, and code blocks; spell out numbers and dates; and keep replies short. Set to `True` to ship `system_prompt` verbatim.
`prewarm_first_message`	`bool`	`False`	Pre-render `first_message` to TTS audio bytes during the ringing window and stream the cached buffer the instant the call connects, eliminating the 200–700 ms TTS first-byte latency on the greeting. Pipeline mode only — the flag is silently ignored (with a `WARN` log) on Realtime / ConvAI engines. Trade-off: pays for the greeting’s TTS even when the call rings out unanswered (~ $0.001–$ 0.005 per ring). Opt in explicitly for inbound calls and low-noise deployments: `prewarm_first_message=True`.

Agent Dataclass

Agent is a frozen (immutable) dataclass. You can construct it directly when you need a dataclass outside of phone.agent():

from getpatter import Agent

agent = Agent(
    system_prompt="You are a helpful assistant.",
    voice="echo",
    language="es",
)

Prefer phone.agent() over constructing Agent directly — the factory method validates credentials, unpacks the engine/STT/TTS instances, and surfaces clear errors up front.

System Prompt

The system_prompt defines the agent’s personality, instructions, and constraints:

agent = phone.agent(
    system_prompt="""You are a scheduling assistant for Dr. Smith's dental office.

Rules:
- Only book appointments Monday through Friday, 9am to 5pm.
- Each appointment is 30 minutes.
- Always confirm the patient's name and phone number.
- If the patient has an emergency, transfer them to the front desk.
""",
)

Dynamic Variables

Use {placeholder} syntax in the system prompt to inject dynamic values at call start. Values are limited to 500 characters each.

agent = phone.agent(
    system_prompt="""You are a support agent for {company_name}.
The customer's name is {customer_name} and their account ID is {account_id}.
Greet them by name and help resolve their issue.""",
    variables={
        "company_name": "Acme Corp",
        "customer_name": "Jane Doe",
        "account_id": "ACC-12345",
    },
)

First Message

When first_message is set, the agent speaks it immediately when a call connects:

agent = phone.agent(
    system_prompt="You are a restaurant reservation assistant.",
    first_message="Good evening! Thank you for calling Luigi's. Would you like to make a reservation?",
)

Pre-warming the first message

Pipeline-mode agents can pre-render the first_message audio during the ringing window and stream the cached buffer the instant the call connects — eliminating the 200–700 ms TTS first-byte latency on the greeting. Opt in explicitly:

agent = phone.agent(
    system_prompt="...",
    first_message="Hello!",
    prewarm_first_message=True,  # enable pre-rendering
)

The trade-off is paying for the greeting’s TTS even when the call rings out unanswered (typically

0.001–

0.005 per ring depending on TTS provider). Good for inbound calls and low-noise deployments; disable for very high-volume outbound where un-answered TTS spend matters. Realtime / ConvAI engines don’t consume the pre-rendered cache (their first message goes through the engine’s own audio path); the flag is silently ignored with a WARN log when set on a non-pipeline agent.

Voice Selection

Voice is usually inferred from the engine or TTS instance — e.g. OpenAIRealtime(voice="nova") or ElevenLabsTTS(voice_id="rachel"). Available voices depend on the provider.

OpenAI Realtime
ElevenLabs
Pipeline

"alloy", "ash", "ballad", "coral", "echo", "fable", "nova", "onyx", "sage", "shimmer", "verse"

Any ElevenLabs voice ID or name (e.g., "rachel", "adam").

Depends on the TTS provider instance (ElevenLabsTTS, RimeTTS, LMNTTTS, …).

Voice Activity Detection (VAD)

Pipeline-mode agents can plug a VAD provider into the vad= parameter to gate STT around real speech and drive barge-in detection. The SDK ships Silero VAD (an ONNX model, ~1 MB) with a telephony-tuned factory:

import asyncio
from getpatter import SileroVAD

# Recommended for any phone-call deployment.
vad = await asyncio.to_thread(SileroVAD.for_phone_call)

agent = phone.agent(
    stt=DeepgramSTT(),
    llm=AnthropicLLM(),
    tts=ElevenLabsTTS(voice_id="rachel"),
    system_prompt="You are a helpful assistant.",
    vad=vad,
)

SileroVAD.for_phone_call(**overrides) is identical to SileroVAD.load(...) but pins sample_rate to 16 000 Hz — the only sample rate Patter’s pipeline-mode audio bus uses (8 kHz mulaw from Twilio is upsampled to 16 kHz PCM before reaching the VAD). Parameters are tuned for telephony-band audio (not the upstream Silero studio defaults):

Field	Default	Upstream equivalent
`activation_threshold`	`0.8`	`threshold` (tuned for telephony, not studio)
`deactivation_threshold`	`0.65`	`neg_threshold = threshold − 0.15` (tuned for telephony)
`min_speech_duration`	`0.25` s	`min_speech_duration_ms = 250`
`min_silence_duration`	`0.1` s	`min_silence_duration_ms = 100`
`prefix_padding_duration`	`0.03` s	`speech_pad_ms = 30`

Override per call site rather than as a global default. A common tweak: deployments that experience truncation on natural pauses raise min_silence_duration to 0.5–1.0 s:

vad = await asyncio.to_thread(
    SileroVAD.for_phone_call, min_silence_duration=0.5
)

SileroVAD.load(...) and SileroVAD.for_phone_call(...) are synchronous (they load the ONNX model). Wrap them in asyncio.to_thread(...) so the event loop stays responsive during process startup.

Engine vs Pipeline Mode

from getpatter import OpenAIRealtime, ElevenLabsConvAI, DeepgramSTT, AnthropicLLM, ElevenLabsTTS

# OpenAI Realtime (default engine) — end-to-end
agent = phone.agent(
    engine=OpenAIRealtime(),
    system_prompt="...",
)

# ElevenLabs Conversational AI — natural voices
agent = phone.agent(
    engine=ElevenLabsConvAI(agent_id="agent_abc123"),
    system_prompt="...",
)

# Pipeline — pick STT, LLM, TTS independently
agent = phone.agent(
    stt=DeepgramSTT(),
    llm=AnthropicLLM(),
    tts=ElevenLabsTTS(voice_id="rachel"),
    system_prompt="...",
)

See LLM for a deeper comparison.

Complete Example

import os
import asyncio
from dotenv import load_dotenv
from getpatter import Patter, Twilio, OpenAIRealtime, Tool, Guardrail

load_dotenv()

phone = Patter(
    carrier=Twilio(),                               # TWILIO_* from env
    phone_number=os.environ["PHONE_NUMBER"],
    webhook_url=os.environ["WEBHOOK_URL"],
)

async def check_availability(args: dict, ctx: dict) -> dict:
    # Hit your reservation system here.
    return {"available": True, "rooms": 3}

agent = phone.agent(
    engine=OpenAIRealtime(voice="nova"),            # OPENAI_API_KEY from env
    system_prompt="""You are a booking assistant for {hotel_name}.
Help guests check availability and make reservations.
Be warm, professional, and concise.""",
    language="en",
    first_message="Welcome to {hotel_name}! How can I assist you with your stay?",
    variables={"hotel_name": "The Grand Hotel"},
    tools=[
        Tool(
            name="check_availability",
            description="Check room availability for given dates",
            parameters={
                "type": "object",
                "properties": {
                    "check_in": {"type": "string", "description": "Check-in date (YYYY-MM-DD)"},
                    "check_out": {"type": "string", "description": "Check-out date (YYYY-MM-DD)"},
                    "guests": {"type": "integer", "description": "Number of guests"},
                },
                "required": ["check_in", "check_out"],
            },
            handler=check_availability,
        ),
    ],
    guardrails=[
        Guardrail(
            name="No pricing promises",
            blocked_terms=["discount", "free upgrade", "complimentary"],
            replacement="I'd be happy to check our current rates for you.",
        ),
    ],
)

async def main():
    await phone.serve(agent, port=8000)

asyncio.run(main())

​Agent Configuration

​Creating an Agent

​Agent Parameters

​Agent Dataclass

​System Prompt

​Dynamic Variables

​First Message

​Pre-warming the first message

​Voice Selection

​Voice Activity Detection (VAD)

​Engine vs Pipeline Mode

​Complete Example

Agent Configuration

Creating an Agent

Agent Parameters

Agent Dataclass

System Prompt

Dynamic Variables

First Message

Pre-warming the first message

Voice Selection

Voice Activity Detection (VAD)

Engine vs Pipeline Mode

Complete Example