LLM (Voice Mode)
Patter supports two voice architectures:| Mode | How to enable | When to use |
|---|---|---|
| Engine (speech-to-speech) | phone.agent({ engine: new OpenAIRealtime(...) }) or engine: new ElevenLabsConvAI(...) | Lowest-latency speech-to-speech. A single provider handles STT + LLM + TTS. |
| Pipeline (STT + LLM + TTS) | phone.agent({ stt, llm, tts }) (omit engine) | Full control. Mix and match providers per stage. |
llm selector in pipeline mode.
Pipeline mode
Compose the three stages independently. Each provider reads its credentials from the environment by default.{ type: "text" | "tool_call" | "done" } chunk protocol, so your tools are defined once and run everywhere.
llm and onMessage are mutually exclusive. Pass one or the other on serve() — passing both raises a clear error at serve() time. When engine is set, llm is ignored (with a one-time warning in the logs). If neither llm nor onMessage is passed and OPENAI_API_KEY is set, Patter auto-constructs the default OpenAI LLM loop — existing 0.5.0 code still works.Supported LLM providers
| Flat import | Namespaced import | Env var | Install |
|---|---|---|---|
OpenAILLM | getpatter/llm/openai → LLM | OPENAI_API_KEY | included |
AnthropicLLM | getpatter/llm/anthropic → LLM | ANTHROPIC_API_KEY | included |
GroqLLM | getpatter/llm/groq → LLM | GROQ_API_KEY | included |
CerebrasLLM | getpatter/llm/cerebras → LLM | CEREBRAS_API_KEY | included |
GoogleLLM | getpatter/llm/google → LLM | GEMINI_API_KEY (falls back to GOOGLE_API_KEY) | included |
apiKey?: string and fall back to the listed env var when it is omitted.
OpenAILLM
OpenAI Chat Completions with streaming + tool calling. Default model"gpt-4o-mini".
AnthropicLLM
Anthropic Messages API with native streaming andtool_use blocks, normalised to Patter’s chunk protocol. Default model "claude-3-5-sonnet-20241022". Pass maxTokens to override the default token cap.
GroqLLM
Hardware-accelerated Llama inference via Groq’s OpenAI-compatible Chat Completions API athttps://api.groq.com/openai/v1. Default model "llama-3.3-70b-versatile".
CerebrasLLM
Cerebras Inference API (OpenAI-compatible) athttps://api.cerebras.ai/v1. Default model "llama3.1-8b". Supports optional gzip request-body compression via gzipCompression: true to reduce time-to-first-token on large prompts — see Cerebras payload optimization.
GoogleLLM
Google Gemini via the Developer API (streaming SSE). Default model"gemini-2.5-flash".
Custom LLM via onMessage
For cases the five built-in providers don’t cover — multi-model routing, local inference, an internal gateway, caching layers — drop llm and plug an async onMessage callback instead:
What’s next
STT
STT providers for pipeline mode.
TTS
TTS providers for pipeline mode.
Tools
Function calling (works across every LLM).
Engines
Speech-to-speech engines (OpenAI Realtime, ElevenLabs ConvAI).

