Documentation Index
Fetch the complete documentation index at: https://docs.getpatter.com/llms.txt
Use this file to discover all available pages before exploring further.
Agent Configuration
AnAgent defines how your voice AI behaves: what it says, how it sounds, what tools it can use, and what guardrails it follows.
Creating an Agent
Use thephone.agent() factory method. The simplest form leans on env-var fallback and a default engine (OpenAIRealtime):
OpenAILLM, AnthropicLLM, GroqLLM, CerebrasLLM, GoogleLLM. Tool calling works across all five. See LLM for the full reference. For fully custom logic (multi-model routing, local models), drop llm= and pass an on_message callback to serve() instead — llm= and on_message are mutually exclusive.
The same pipeline using namespaced imports:
Agent Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
system_prompt | str | required | Instructions that define the agent’s behavior. |
engine | OpenAIRealtime | OpenAIRealtime2 | ElevenLabsConvAI | None | None → OpenAI Realtime | End-to-end engine. See Engines. Omit for pipeline mode. |
stt | STTProvider | None | None | STT instance for pipeline mode (DeepgramSTT(), CartesiaSTT(), …). See STT. |
llm | LLMProvider | None | None | LLM instance for pipeline mode (AnthropicLLM(), GroqLLM(), …). Mutually exclusive with on_message on serve(). Ignored when engine is set. See LLM. |
tts | TTSProvider | None | None | TTS instance for pipeline mode (ElevenLabsTTS(), RimeTTS(), …). See TTS. |
voice | str | "alloy" | Voice name. Usually inferred from the engine or TTS instance. |
model | str | "gpt-realtime-mini" | Model ID for OpenAI Realtime. Usually inferred from the engine. |
language | str | "en" | BCP-47 language code. |
first_message | str | "" | If set, the agent speaks this immediately when a call connects. |
tools | list[Tool] | None | None | Tool(...) instances for function calling. See Tools. |
variables | dict | None | None | Dynamic variable substitutions for {placeholder} patterns in the system prompt. Values limited to 500 chars. |
guardrails | list[Guardrail] | None | None | Guardrail(...) instances applied to LLM output. See Guardrails. |
hooks | PipelineHooks | None | None | Pipeline hooks for intercepting STT/TTS processing. Pipeline mode only. See Events. |
text_transforms | list[Callable] | None | None | Text transformation functions applied to LLM output before TTS. Pipeline mode only. |
vad | VADProvider | None | None | Voice activity detection provider (e.g. Silero). Pipeline mode only. |
audio_filter | AudioFilter | None | None | Pre-STT audio filter (e.g. Krisp noise suppression). Pipeline mode only. |
background_audio | BackgroundAudioPlayer | None | None | Hold music / ambient-cue mixer. Pipeline mode only. |
barge_in_threshold_ms | int | 300 | Sustained-voice window (ms) before treating caller audio as barge-in. Set to 0 to disable. |
aggressive_first_flush | bool | False | Opt-in low-latency mode: emits the first clause on a soft punctuation boundary (,, em-dash, en-dash) once the buffer reaches ~40 chars. Saves 200–500 ms TTFA on the first sentence at the cost of slightly clipped prosody. Hard-disabled when language starts with "it" (Italian decimal commas would split mid-number). Pipeline mode only. |
disable_phone_preamble | bool | False | When False (default), Patter prepends a phone-friendly preamble to system_prompt that instructs the LLM to avoid markdown, emojis, bullet lists, and code blocks; spell out numbers and dates; and keep replies short. Set to True to ship system_prompt verbatim. |
prewarm_first_message | bool | False | Pre-render first_message to TTS audio bytes during the ringing window and stream the cached buffer the instant the call connects, eliminating the 200–700 ms TTS first-byte latency on the greeting. Pipeline mode only — the flag is silently ignored (with a WARN log) on Realtime / ConvAI engines. Trade-off: pays for the greeting’s TTS even when the call rings out unanswered (~0.005 per ring). Opt in explicitly for inbound calls and low-noise deployments: prewarm_first_message=True. |
Agent Dataclass
Agent is a frozen (immutable) dataclass. You can construct it directly when you need a dataclass outside of phone.agent():
Prefer
phone.agent() over constructing Agent directly — the factory method validates credentials, unpacks the engine/STT/TTS instances, and surfaces clear errors up front.System Prompt
Thesystem_prompt defines the agent’s personality, instructions, and constraints:
Dynamic Variables
Use{placeholder} syntax in the system prompt to inject dynamic values at call start. Values are limited to 500 characters each.
First Message
Whenfirst_message is set, the agent speaks it immediately when a call connects:
Pre-warming the first message
Pipeline-mode agents can pre-render thefirst_message audio during the ringing window and stream the cached buffer the instant the call connects — eliminating the 200–700 ms TTS first-byte latency on the greeting. Opt in explicitly:
WARN log when set on a non-pipeline agent.
Voice Selection
Voice is usually inferred from the engine or TTS instance — e.g.OpenAIRealtime(voice="nova") or ElevenLabsTTS(voice_id="rachel"). Available voices depend on the provider.
- OpenAI Realtime
- ElevenLabs
- Pipeline
"alloy", "ash", "ballad", "coral", "echo", "fable", "nova", "onyx", "sage", "shimmer", "verse"Voice Activity Detection (VAD)
Pipeline-mode agents can plug a VAD provider into thevad= parameter to gate STT around real speech and drive barge-in detection. The SDK ships Silero VAD (an ONNX model, ~1 MB) with a telephony-tuned factory:
SileroVAD.for_phone_call(**overrides) is identical to SileroVAD.load(...) but pins sample_rate to 16 000 Hz — the only sample rate Patter’s pipeline-mode audio bus uses (8 kHz mulaw from Twilio is upsampled to 16 kHz PCM before reaching the VAD). Parameters are tuned for telephony-band audio (not the upstream Silero studio defaults):
| Field | Default | Upstream equivalent |
|---|---|---|
activation_threshold | 0.8 | threshold (tuned for telephony, not studio) |
deactivation_threshold | 0.65 | neg_threshold = threshold − 0.15 (tuned for telephony) |
min_speech_duration | 0.25 s | min_speech_duration_ms = 250 |
min_silence_duration | 0.1 s | min_silence_duration_ms = 100 |
prefix_padding_duration | 0.03 s | speech_pad_ms = 30 |
min_silence_duration to 0.5–1.0 s:
SileroVAD.load(...) and SileroVAD.for_phone_call(...) are synchronous (they load the ONNX model). Wrap them in asyncio.to_thread(...) so the event loop stays responsive during process startup.
