Skip to main content

LLM (Voice Mode)

Patter supports two voice architectures:
ModeHow to enableWhen to use
Engine (speech-to-speech)phone.agent(engine=OpenAIRealtime(...)) or engine=ElevenLabsConvAI(...)Lowest-latency speech-to-speech. A single provider handles STT + LLM + TTS.
Pipeline (STT + LLM + TTS)phone.agent(stt=..., llm=..., tts=...) (omit engine=)Full control. Mix and match providers per stage.
See Engines for engine-mode reference. This page focuses on the llm= selector in pipeline mode.

Pipeline mode

Compose the three stages independently. Each provider reads its credentials from the environment by default.
import asyncio
from getpatter import Patter, Twilio, DeepgramSTT, AnthropicLLM, ElevenLabsTTS

phone = Patter(carrier=Twilio(), phone_number="+15550001234")

agent = phone.agent(
    stt=DeepgramSTT(),                    # DEEPGRAM_API_KEY
    llm=AnthropicLLM(),                   # ANTHROPIC_API_KEY
    tts=ElevenLabsTTS(voice_id="rachel"), # ELEVENLABS_API_KEY
    system_prompt="You are a helpful assistant.",
    first_message="Hi!",
)

async def main():
    await phone.serve(agent)

asyncio.run(main())
Tool calling works across every provider — each adapter normalizes its vendor-specific streaming format to Patter’s unified {type: "text" | "tool_call" | "done"} chunk protocol, so your tools are defined once and run everywhere.
llm= and on_message are mutually exclusive. Pass one or the other on serve() — passing both raises a clear error at serve() time. When engine= is set, llm= is ignored (with a one-time warning in the logs). If neither llm= nor on_message is passed and OPENAI_API_KEY is set, Patter auto-constructs the default OpenAI LLM loop — existing 0.5.0 code still works.

Supported LLM providers

Flat importNamespaced importEnv varInstall extra
OpenAILLMgetpatter.llm.openai.LLMOPENAI_API_KEYincluded
AnthropicLLMgetpatter.llm.anthropic.LLMANTHROPIC_API_KEYgetpatter[anthropic]
GroqLLMgetpatter.llm.groq.LLMGROQ_API_KEYgetpatter[groq]
CerebrasLLMgetpatter.llm.cerebras.LLMCEREBRAS_API_KEYgetpatter[cerebras]
GoogleLLMgetpatter.llm.google.LLMGEMINI_API_KEY (falls back to GOOGLE_API_KEY)getpatter[google]
All classes accept api_key: str | None = None and fall back to the listed env var when it is omitted.

OpenAILLM

OpenAI Chat Completions with streaming + tool calling. Default model "gpt-4o-mini". For other OpenAI-compatible endpoints use the dedicated wrappers (GroqLLM, CerebrasLLM) — they subclass OpenAILLMProvider with the right base_url.
from getpatter import OpenAILLM                    # flat
from getpatter.llm import openai                   # namespaced

llm = OpenAILLM()                                   # reads OPENAI_API_KEY
llm = openai.LLM(api_key="sk-...", model="gpt-4o-mini")

AnthropicLLM

Anthropic Messages API with native streaming and tool_use blocks, normalised to Patter’s chunk protocol. Default model "claude-3-5-sonnet-20241022", default max_tokens=1024 (Anthropic requires an explicit cap on every request).
from getpatter import AnthropicLLM                 # flat
from getpatter.llm import anthropic                # namespaced

llm = AnthropicLLM()                                # reads ANTHROPIC_API_KEY
llm = anthropic.LLM(
    api_key="sk-ant-...",
    model="claude-3-5-sonnet-20241022",
    max_tokens=2048,
)
Install: pip install 'getpatter[anthropic]'.

GroqLLM

Hardware-accelerated Llama inference via Groq’s OpenAI-compatible Chat Completions API at https://api.groq.com/openai/v1. Default model "llama-3.3-70b-versatile".
from getpatter import GroqLLM                      # flat
from getpatter.llm import groq                     # namespaced

llm = GroqLLM()                                     # reads GROQ_API_KEY
llm = groq.LLM(api_key="gsk_...", model="llama-3.3-70b-versatile")
Install: pip install 'getpatter[groq]'.

CerebrasLLM

Cerebras Inference API (OpenAI-compatible) at https://api.cerebras.ai/v1. Default model "llama3.1-8b". Supports optional msgpack + gzip payload compression (enabled by default) to reduce time-to-first-token on large prompts — see Cerebras payload optimization.
from getpatter import CerebrasLLM                  # flat
from getpatter.llm import cerebras                 # namespaced

llm = CerebrasLLM()                                 # reads CEREBRAS_API_KEY
llm = cerebras.LLM(
    api_key="csk-...",
    model="llama3.1-8b",
    gzip_compression=True,                          # defaults to True
    msgpack_encoding=True,                          # defaults to True
)
Install: pip install 'getpatter[cerebras]'.

GoogleLLM

Google Gemini via the google-genai SDK. Supports the Gemini Developer API (API key) and Vertex AI (GCP project + location). Default model "gemini-2.5-flash".
from getpatter import GoogleLLM                    # flat
from getpatter.llm import google                   # namespaced

llm = GoogleLLM()                                   # reads GEMINI_API_KEY, falls back to GOOGLE_API_KEY
llm = google.LLM(api_key="AIza...", model="gemini-2.5-flash")

# Vertex AI
llm = google.LLM(vertexai=True, project="my-gcp-project", location="us-central1")
Install: pip install 'getpatter[google]'.

Custom LLM via on_message

For cases the five built-in providers don’t cover — multi-model routing, local llama.cpp, an internal gateway, caching layers — drop llm= and plug an async on_message callback instead:
import asyncio
from getpatter import Patter, Twilio, DeepgramSTT, ElevenLabsTTS

phone = Patter(carrier=Twilio(), phone_number="+15550001234")

agent = phone.agent(
    stt=DeepgramSTT(),
    tts=ElevenLabsTTS(voice_id="rachel"),
    system_prompt="You are a helpful assistant.",
)

async def handle_message(event) -> str:
    # Route to any model you like — local llama.cpp, a private gateway, etc.
    return f"You said: {event['text']}. How can I help?"

async def main():
    await phone.serve(agent, on_message=handle_message)

asyncio.run(main())
on_message and llm= cannot be used together. Combining them raises a clear error at serve() time — pick one.

What’s next

STT

STT providers for pipeline mode.

TTS

TTS providers for pipeline mode.

Tools

Function calling (works across every LLM).

Engines

Speech-to-speech engines (OpenAI Realtime, ElevenLabs ConvAI).