LLM (Voice Mode)
Patter supports two voice architectures:| Mode | How to enable | When to use |
|---|---|---|
| Engine (speech-to-speech) | phone.agent(engine=OpenAIRealtime(...)) or engine=ElevenLabsConvAI(...) | Lowest-latency speech-to-speech. A single provider handles STT + LLM + TTS. |
| Pipeline (STT + LLM + TTS) | phone.agent(stt=..., llm=..., tts=...) (omit engine=) | Full control. Mix and match providers per stage. |
llm= selector in pipeline mode.
Pipeline mode
Compose the three stages independently. Each provider reads its credentials from the environment by default.{type: "text" | "tool_call" | "done"} chunk protocol, so your tools are defined once and run everywhere.
llm= and on_message are mutually exclusive. Pass one or the other on serve() — passing both raises a clear error at serve() time. When engine= is set, llm= is ignored (with a one-time warning in the logs). If neither llm= nor on_message is passed and OPENAI_API_KEY is set, Patter auto-constructs the default OpenAI LLM loop — existing 0.5.0 code still works.Supported LLM providers
| Flat import | Namespaced import | Env var | Install extra |
|---|---|---|---|
OpenAILLM | getpatter.llm.openai.LLM | OPENAI_API_KEY | included |
AnthropicLLM | getpatter.llm.anthropic.LLM | ANTHROPIC_API_KEY | getpatter[anthropic] |
GroqLLM | getpatter.llm.groq.LLM | GROQ_API_KEY | getpatter[groq] |
CerebrasLLM | getpatter.llm.cerebras.LLM | CEREBRAS_API_KEY | getpatter[cerebras] |
GoogleLLM | getpatter.llm.google.LLM | GEMINI_API_KEY (falls back to GOOGLE_API_KEY) | getpatter[google] |
api_key: str | None = None and fall back to the listed env var when it is omitted.
OpenAILLM
OpenAI Chat Completions with streaming + tool calling. Default model"gpt-4o-mini". For other OpenAI-compatible endpoints use the dedicated wrappers (GroqLLM, CerebrasLLM) — they subclass OpenAILLMProvider with the right base_url.
AnthropicLLM
Anthropic Messages API with native streaming andtool_use blocks, normalised to Patter’s chunk protocol. Default model "claude-3-5-sonnet-20241022", default max_tokens=1024 (Anthropic requires an explicit cap on every request).
pip install 'getpatter[anthropic]'.
GroqLLM
Hardware-accelerated Llama inference via Groq’s OpenAI-compatible Chat Completions API athttps://api.groq.com/openai/v1. Default model "llama-3.3-70b-versatile".
pip install 'getpatter[groq]'.
CerebrasLLM
Cerebras Inference API (OpenAI-compatible) athttps://api.cerebras.ai/v1. Default model "llama3.1-8b". Supports optional msgpack + gzip payload compression (enabled by default) to reduce time-to-first-token on large prompts — see Cerebras payload optimization.
pip install 'getpatter[cerebras]'.
GoogleLLM
Google Gemini via thegoogle-genai SDK. Supports the Gemini Developer API (API key) and Vertex AI (GCP project + location). Default model "gemini-2.5-flash".
pip install 'getpatter[google]'.
Custom LLM via on_message
For cases the five built-in providers don’t cover — multi-model routing, local llama.cpp, an internal gateway, caching layers — drop llm= and plug an async on_message callback instead:
What’s next
STT
STT providers for pipeline mode.
TTS
TTS providers for pipeline mode.
Tools
Function calling (works across every LLM).
Engines
Speech-to-speech engines (OpenAI Realtime, ElevenLabs ConvAI).

