Skip to main content

LLM (Voice Mode)

Patter supports two voice architectures:
ModeHow to enableWhen to use
Engine (speech-to-speech)phone.agent({ engine: new OpenAIRealtime(...) }) or engine: new ElevenLabsConvAI(...)Lowest-latency speech-to-speech. A single provider handles STT + LLM + TTS.
Pipeline (STT + LLM + TTS)phone.agent({ stt, llm, tts }) (omit engine)Full control. Mix and match providers per stage.
See Engines for engine-mode reference. This page focuses on the llm selector in pipeline mode.

Pipeline mode

Compose the three stages independently. Each provider reads its credentials from the environment by default.
// npx tsx example.ts
import { Patter, Twilio, DeepgramSTT, AnthropicLLM, ElevenLabsTTS } from "getpatter";

const phone = new Patter({ carrier: new Twilio(), phoneNumber: "+15550001234" });

const agent = phone.agent({
  stt: new DeepgramSTT(),                       // DEEPGRAM_API_KEY
  llm: new AnthropicLLM(),                      // ANTHROPIC_API_KEY
  tts: new ElevenLabsTTS({ voiceId: "rachel" }), // ELEVENLABS_API_KEY
  systemPrompt: "You are a helpful assistant.",
  firstMessage: "Hi!",
});

await phone.serve({ agent });
Tool calling works across every provider — each adapter normalizes its vendor-specific streaming format to Patter’s unified { type: "text" | "tool_call" | "done" } chunk protocol, so your tools are defined once and run everywhere.
llm and onMessage are mutually exclusive. Pass one or the other on serve() — passing both raises a clear error at serve() time. When engine is set, llm is ignored (with a one-time warning in the logs). If neither llm nor onMessage is passed and OPENAI_API_KEY is set, Patter auto-constructs the default OpenAI LLM loop — existing 0.5.0 code still works.

Supported LLM providers

Flat importNamespaced importEnv varInstall
OpenAILLMgetpatter/llm/openaiLLMOPENAI_API_KEYincluded
AnthropicLLMgetpatter/llm/anthropicLLMANTHROPIC_API_KEYincluded
GroqLLMgetpatter/llm/groqLLMGROQ_API_KEYincluded
CerebrasLLMgetpatter/llm/cerebrasLLMCEREBRAS_API_KEYincluded
GoogleLLMgetpatter/llm/googleLLMGEMINI_API_KEY (falls back to GOOGLE_API_KEY)included
All classes accept an options object with apiKey?: string and fall back to the listed env var when it is omitted.

OpenAILLM

OpenAI Chat Completions with streaming + tool calling. Default model "gpt-4o-mini".
import { OpenAILLM } from "getpatter";                    // flat
import * as openai from "getpatter/llm/openai";           // namespaced

const llm = new OpenAILLM();                              // reads OPENAI_API_KEY
const llm2 = new openai.LLM({ apiKey: "sk-...", model: "gpt-4o-mini" });

AnthropicLLM

Anthropic Messages API with native streaming and tool_use blocks, normalised to Patter’s chunk protocol. Default model "claude-3-5-sonnet-20241022". Pass maxTokens to override the default token cap.
import { AnthropicLLM } from "getpatter";                 // flat
import * as anthropic from "getpatter/llm/anthropic";     // namespaced

const llm = new AnthropicLLM();                           // reads ANTHROPIC_API_KEY
const llm2 = new anthropic.LLM({
  apiKey: "sk-ant-...",
  model: "claude-3-5-sonnet-20241022",
  maxTokens: 2048,
});

GroqLLM

Hardware-accelerated Llama inference via Groq’s OpenAI-compatible Chat Completions API at https://api.groq.com/openai/v1. Default model "llama-3.3-70b-versatile".
import { GroqLLM } from "getpatter";                      // flat
import * as groq from "getpatter/llm/groq";               // namespaced

const llm = new GroqLLM();                                // reads GROQ_API_KEY
const llm2 = new groq.LLM({ apiKey: "gsk_...", model: "llama-3.3-70b-versatile" });

CerebrasLLM

Cerebras Inference API (OpenAI-compatible) at https://api.cerebras.ai/v1. Default model "llama3.1-8b". Supports optional gzip request-body compression via gzipCompression: true to reduce time-to-first-token on large prompts — see Cerebras payload optimization.
import { CerebrasLLM } from "getpatter";                  // flat
import * as cerebras from "getpatter/llm/cerebras";       // namespaced

const llm = new CerebrasLLM();                            // reads CEREBRAS_API_KEY
const llm2 = new cerebras.LLM({
  apiKey: "csk-...",
  model: "llama3.1-8b",
  gzipCompression: true,
});

GoogleLLM

Google Gemini via the Developer API (streaming SSE). Default model "gemini-2.5-flash".
import { GoogleLLM } from "getpatter";                    // flat
import * as google from "getpatter/llm/google";           // namespaced

const llm = new GoogleLLM();                              // reads GEMINI_API_KEY, falls back to GOOGLE_API_KEY
const llm2 = new google.LLM({ apiKey: "AIza...", model: "gemini-2.5-flash" });

Custom LLM via onMessage

For cases the five built-in providers don’t cover — multi-model routing, local inference, an internal gateway, caching layers — drop llm and plug an async onMessage callback instead:
// npx tsx example.ts
import { Patter, Twilio, DeepgramSTT, ElevenLabsTTS } from "getpatter";

const phone = new Patter({ carrier: new Twilio(), phoneNumber: "+15550001234" });

const agent = phone.agent({
  stt: new DeepgramSTT(),
  tts: new ElevenLabsTTS({ voiceId: "rachel" }),
  systemPrompt: "You are a helpful assistant.",
});

await phone.serve({
  agent,
  onMessage: async ({ text }) => {
    // Route to any model you like — local inference, a private gateway, etc.
    return `You said: ${text}. How can I help?`;
  },
});
onMessage and llm cannot be used together. Combining them raises a clear error at serve() time — pick one.

What’s next

STT

STT providers for pipeline mode.

TTS

TTS providers for pipeline mode.

Tools

Function calling (works across every LLM).

Engines

Speech-to-speech engines (OpenAI Realtime, ElevenLabs ConvAI).