Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.getpatter.com/llms.txt

Use this file to discover all available pages before exploring further.

Anthropic LLM

AnthropicLLM plugs Anthropic’s Claude models into Patter’s pipeline mode. It speaks the Messages API natively (streaming + tool_use blocks) and normalises every event into Patter’s unified {type: "text" | "tool_call" | "done"} chunk protocol, so tools defined once run across every LLM provider. Prompt caching is enabled by default. The system prompt and the last tool block are tagged with cache_control: { type: "ephemeral" }, which cuts time-to-first-token by ~100-400 ms and ~90% of input-token cost on every cached turn.

Install

pip install "getpatter[anthropic]"
npm install getpatter

Usage

# Namespaced import
from getpatter.llm import anthropic

llm = anthropic.LLM()                                       # reads ANTHROPIC_API_KEY
llm = anthropic.LLM(api_key="sk-ant-...", model="claude-haiku-4-5-20251001")
llm = anthropic.LLM(prompt_caching=False)                   # opt out of caching

# Flat alias (equivalent)
from getpatter import AnthropicLLM

llm = AnthropicLLM()
The namespaced import (from getpatter.llm import anthropic / import * as anthropic from "getpatter/llm/anthropic") auto-resolves the API key from ANTHROPIC_API_KEY and exposes a uniform LLM class — the same pattern Patter uses for STT and TTS namespaces.
Plug it into an agent:
import asyncio
from getpatter import Patter, Twilio, DeepgramSTT, AnthropicLLM, ElevenLabsTTS

phone = Patter(carrier=Twilio(), phone_number="+15550001234")

agent = phone.agent(
    stt=DeepgramSTT(),
    llm=AnthropicLLM(),                                     # ANTHROPIC_API_KEY from env
    tts=ElevenLabsTTS(voice_id="rachel"),
    system_prompt="You are a helpful assistant.",
    first_message="Hi, how can I help?",
)

asyncio.run(phone.serve(agent))

Supported models

Pricing in USD per 1M tokens. cache_read is billed at ~10% of full input; cache_write at ~125%. Versioned snapshots (e.g. claude-haiku-4-5-20251001) resolve against the base entry via longest-prefix match in pricing.py.
ModelInputOutputCache readCache write
claude-opus-4-7$15.00$75.00$1.50$18.75
claude-sonnet-4-6$3.00$15.00$0.30$3.75
claude-haiku-4-5 (default)$1.00$5.00$0.10$1.25
Aliases that route to the latest snapshot: claude-haiku-4-5, claude-sonnet-4-6, claude-opus-4-7, claude-3-5-sonnet-latest, claude-3-5-haiku-latest. Pinned snapshots include claude-haiku-4-5-20251001, claude-3-5-sonnet-20241022, claude-3-5-haiku-20241022.

Environment variables

VariableRequiredNotes
ANTHROPIC_API_KEYyesAuto-loaded when api_key / apiKey is omitted.

Options

OptionDefaultNotes
api_key / apiKeyNoneReads from ANTHROPIC_API_KEY when omitted.
model"claude-haiku-4-5-20251001"Any Anthropic Claude model id or alias.
max_tokens / maxTokens1024Required by the Messages API on every request.
temperatureunsetOptional sampling temperature.
base_url / baseUrlunsetOverride the Messages API endpoint (rarely needed).
prompt_caching / promptCachingTrueTags the system prompt and last tool block with cache_control: ephemeral. Disable when system prompt + tools are below Anthropic’s minimum cacheable size (~1024 tokens for Sonnet/Opus, ~2048 for Haiku) — caching has no effect below that threshold.

Prompt caching

For voice agents with long instruction-dense system prompts and large tool catalogs, prompt caching is the single biggest TTFT win Anthropic ships. Patter applies the recommended pattern automatically:
  • The system prompt becomes a single text block tagged cache_control: ephemeral.
  • The last tool definition is tagged cache_control: ephemeral, which caches the entire tool array (Anthropic caches everything up to and including a marked block).
  • The anthropic-beta: prompt-caching-2024-07-31 header is sent on every request for consistent behaviour across model snapshots.
The cache lives ~5 minutes — the first request writes it, subsequent requests within that window hit it for ~90% input-token savings on the cached portion.