Metrics & Cost Tracking

Patter automatically tracks cost and latency for every call, broken down by provider component (STT, TTS, LLM, telephony).

How It Works

Metrics are collected automatically during calls. When a call ends, the onCallEnd callback receives a CallMetrics object with the full breakdown:

await phone.serve({
  agent,
  port: 8000,
  onCallEnd: async (event) => {
    const metrics = event.metrics;
    if (metrics) {
      console.log(`Duration: ${metrics.duration_seconds}s`);
      console.log(`Total cost: $${metrics.cost.total.toFixed(4)}`);
      console.log(`  STT: $${metrics.cost.stt.toFixed(4)}`);
      console.log(`  TTS: $${metrics.cost.tts.toFixed(4)}`);
      console.log(`  LLM: $${metrics.cost.llm.toFixed(4)}`);
      console.log(`  Telephony: $${metrics.cost.telephony.toFixed(4)}`);
      console.log(`Avg latency: ${metrics.latency_avg.total_ms}ms`);
      console.log(`P95 latency: ${metrics.latency_p95.total_ms}ms`);
    }
  },
});

Cost Breakdown

The CostBreakdown object provides per-component costs in USD:

Field	Description
`stt`	Speech-to-text cost (Deepgram, Whisper).
`tts`	Text-to-speech cost (ElevenLabs, OpenAI TTS).
`llm`	LLM cost (OpenAI Realtime tokens).
`telephony`	Telephony cost (Twilio, Telnyx per-minute).
`total`	Sum of all components.

Latency Breakdown

The LatencyBreakdown object provides per-component latency in milliseconds:

Field	Description
`stt_ms`	Time from user speech to transcript.
`endpoint_ms`	Time the endpointer waited after the last word before declaring end-of-utterance.
`llm_ttft_ms`	Time from end-of-utterance to the first LLM token.
`llm_total_ms`	Time from end-of-utterance to the last LLM token (full response).
`llm_ms`	Alias for `llm_ttft_ms` (kept for back-compat).
`tts_ms`	Time from first LLM token to first TTS audio byte.
`tts_total_ms`	Time from first LLM token to last TTS audio byte.
`bargein_ms`	Time from caller voice detected to TTS playback cancelled (only set on barge-in turns).
`total_ms`	End-to-end latency (user speech to first audio).

CallMetrics exposes the full distribution: latency_avg, latency_p50 (median / typical UX), latency_p90 (steady-state outliers), latency_p95 (SLA), and latency_p99 (cold-start outliers).

Per-Turn Metrics

Each conversation turn is tracked individually:

await phone.serve({
  agent,
  port: 8000,
  onCallEnd: async (event) => {
    const metrics = event.metrics;
    if (metrics) {
      for (const turn of metrics.turns) {
        console.log(`Turn ${turn.turn_index}:`);
        console.log(`  User: ${turn.user_text}`);
        console.log(`  Agent: ${turn.agent_text}`);
        console.log(`  Latency: ${turn.latency.total_ms}ms`);
      }
    }
  },
});

Custom Pricing

Override default provider pricing estimates:

await phone.serve({
  agent,
  port: 8000,
  pricing: {
    deepgram: { price: 0.005 },      // Override STT price per minute
    elevenlabs: { price: 0.15 },      // Override TTS price per 1k chars
    twilio: { price: 0.015 },         // Override telephony price per minute
  },
});

PricingUnit

The pricing tables expose a PricingUnit constant so overrides don’t depend on raw strings:

import { PricingUnit } from "getpatter";

PricingUnit.MINUTE;          // "minute" — per minute of audio (STT, telephony)
PricingUnit.THOUSAND_CHARS;  // "1k_chars" — per thousand characters synthesised (TTS)
PricingUnit.TOKEN;           // "token" — per token (LLM / Realtime)

Shipped as a const object plus value-union type so it is tree-shakeable. Mirrored byte-for-byte by the Python PricingUnit StrEnum.

Model-Aware Pricing

Patter’s pricing tables are model-aware: every entry in DEFAULT_PRICING carries provider-level defaults plus an optional models map keyed by model identifier. When the agent’s adapter exposes a model field, the metrics layer threads it through the cost-calc functions and the dashboard bills with model accuracy out of the box — no manual override required.

import { PRICING_VERSION, PRICING_LAST_UPDATED } from "getpatter";

PRICING_VERSION;       // "2026.3"
PRICING_LAST_UPDATED;  // "2026-05-08"

How resolution works

The cost-calc helpers (calculateSttCost, calculateTtsCost, calculateRealtimeCost, calculateRealtimeCachedSavings) accept an optional final model parameter. The exported resolveProviderRates(config, model) helper merges per-model overrides on top of provider defaults using:

Exact match in the provider’s models map.
Longest-prefix match — gpt-realtime-2-2026-05-08 resolves against gpt-realtime-2.
Provider defaults — fallback when the model is unknown or omitted.

CallMetricsAccumulator auto-tracks sttModel, ttsModel, and realtimeModel from the agent’s adapter model field (agent.stt.model, agent.tts.model, agent.model for Realtime). On every recordRealtimeUsage(usage) call the realtime model is also pulled from the response.done payload itself, overriding the call-level default — so mid-call model switches are billed correctly.

The optional model argument defaults to undefined, which preserves the legacy provider-rate behaviour. Existing callers compile and run unchanged.

Example A — Just select a model

The most common case: pick a model on your adapter, and Patter bills the right rate automatically.

import { Patter, Twilio, OpenAIRealtime } from "getpatter";

const phone = new Patter({ carrier: new Twilio(), phoneNumber: "+15550001234" });

const agent = phone.agent({
  systemPrompt: "You are a helpful assistant.",
  engine: new OpenAIRealtime({ model: "gpt-realtime-2" }),
});
// Billing auto-uses the gpt-realtime-2 rate ($32/M audio in, $64/M audio out).

Example B — Override one model, keep siblings intact

mergePricing overlays the nested models map shallowly. Overriding a single model leaves the other rates inside the same provider untouched.

const phone = new Patter({
  carrier: new Twilio(),
  phoneNumber: "+15550001234",
  pricing: {
    // Negotiated a discount on Nova-2 only — Nova-3 / Whisper rates stay default.
    deepgram: { models: { "nova-2": { price: 0.004 } } },
  },
});

Example C — Register a brand-new model rate

Add a model that isn’t in the built-in table without touching SDK source.

const phone = new Patter({
  carrier: new Twilio(),
  phoneNumber: "+15550001234",
  pricing: {
    elevenlabs: {
      models: { my_custom_voice: { price: 0.075 } },
    },
  },
});
// When agent.tts.model === "my_custom_voice", calculateTtsCost picks up $0.075/1k.

Default Pricing (2026.3)

Provider-level defaults are listed below. Per-model rates live under DEFAULT_PRICING[provider].models and are auto-resolved when the adapter exposes its model identifier.

Provider	Unit	Default Price (default model)
Deepgram (`nova-3` streaming mono)	per minute	$0.0077
OpenAI Whisper (`whisper-1`)	per minute	$0.006
OpenAI Transcribe (`gpt-4o-transcribe`)	per minute	$0.006
AssemblyAI	per minute	$0.0025
Cartesia STT (ink-whisper)	per minute	$0.0025
Soniox	per minute	$0.002
Speechmatics (Pro)	per minute	$0.004
ElevenLabs (`eleven_flash_v2_5`)	per 1k chars	$0.06
OpenAI TTS (`tts-1`)	per 1k chars	$0.015
Cartesia TTS (`sonic-2`)	per 1k chars	$0.030
Rime (`mistv2`)	per 1k chars	$0.030
LMNT (`aurora`)	per 1k chars	$0.050
Inworld (`inworld-tts-2`)	per 1k chars	$0.020
OpenAI Realtime (`gpt-realtime-mini` / `gpt-4o-mini-realtime-preview`)	per token	$10/M audio in ·$ 20/M audio out · $0.60/M text in ·$ 2.40/M text out (cached: $0.30/M audio ·$ 0.06/M text)
Twilio (US inbound local)	per minute	$0.0085 (rounded up to whole minute, per Twilio)
Telnyx	per minute	$0.007

STT — per-model rates

Provider	Model	Price
Deepgram	`nova-3` (default)	$0.0077/min
Deepgram	`nova-3-multilingual`	$0.0092/min
Deepgram	`nova-2`	$0.0058/min
Deepgram	`nova`	$0.0043/min
Deepgram	`whisper-large` / `whisper-medium`	$0.0048/min
OpenAI Whisper	`whisper-1` (default)	$0.006/min
OpenAI Whisper	`gpt-4o-transcribe`	$0.006/min
OpenAI Whisper	`gpt-4o-mini-transcribe`	$0.003/min
OpenAI Whisper	`gpt-realtime-whisper`	$0.017/min
OpenAI Transcribe (`openai_transcribe`)	`gpt-4o-transcribe` (default)	$0.006/min
OpenAI Transcribe	`gpt-4o-mini-transcribe`	$0.003/min
OpenAI Transcribe	`whisper-1`	$0.006/min

TTS — per-model rates

Provider	Model	Price
ElevenLabs (REST + WebSocket)	`eleven_flash_v2_5` (default)	$0.06/1k
ElevenLabs	`eleven_turbo_v2_5`	$0.05/1k
ElevenLabs	`eleven_multilingual_v2` / `eleven_monolingual_v1`	$0.18/1k
ElevenLabs	`eleven_v3`	$0.30/1k
OpenAI TTS	`tts-1` (default)	$0.015/1k
OpenAI TTS	`tts-1-hd`	$0.030/1k
OpenAI TTS	`gpt-4o-mini-tts`	$0.012/1k
Cartesia	`sonic-1` / `sonic-2` / `sonic-english` / `sonic-multilingual`	$0.030/1k
Rime	`mistv2` (default) / `mist`	$0.030/1k
Rime	`arcana`	$0.040/1k
LMNT	`aurora` (default) / `blizzard`	$0.050/1k
Inworld	`inworld-tts-2` (default)	$0.020/1k
Inworld	`inworld-tts-1.5-max` / `inworld-tts-1.5`	$0.025/1k

OpenAI Realtime — per-model rates

Model	Audio in / out (per token)	Text in / out (per token)	Cached audio / text (per token)
`gpt-realtime-mini` (default) / `gpt-4o-mini-realtime-preview`	$0.00001 /$ 0.00002	$0.0000006 /$ 0.0000024	$0.0000003 /$ 0.00000006
`gpt-realtime`	$0.000032 /$ 0.000064	$0.000004 /$ 0.000016	$0.0000004 /$ 0.0000004
`gpt-realtime-2`	$0.000032 /$ 0.000064	$0.000004 /$ 0.000024	$0.0000004 /$ 0.0000004
`gpt-4o-realtime-preview`	$0.0001 /$ 0.0002	$0.000005 /$ 0.000020	$0.0000020 /$ 0.0000025

gpt-4o-realtime-preview is roughly 10x the cost of gpt-realtime-mini for audio. Switching realtime models has direct billing impact — confirm the model on agent.realtime.model matches the rate you expect.

Twilio defaults match US inbound local. Override pricing.twilio.price for US toll-free inbound (~

0.022/min) or US outbound local (~

0.014/min). Default pricing is based on publicly listed provider rates and may become stale — check the provider’s pricing page or pass your own overrides for authoritative numbers.

Real-Time Metrics

Use the onMetrics callback for live cost updates during a call:

await phone.serve({
  agent,
  port: 8000,
  onMetrics: async (data) => {
    const turn = data.turn as Record<string, unknown>;
    const latency = turn.latency as Record<string, number>;
    console.log(`Call ${data.call_id} — turn ${turn.turn_index}`);
    console.log(`  Latency: ${latency.total_ms}ms`);
  },
});

Data Types

import type {
  CallMetrics,
  CostBreakdown,
  LatencyBreakdown,
  TurnMetrics,
} from "getpatter";

CallMetrics

Field	Type	Description
`call_id`	`string`	Unique call identifier.
`duration_seconds`	`number`	Total call duration.
`turns`	`TurnMetrics[]`	Per-turn metrics.
`cost`	`CostBreakdown`	Cost breakdown.
`latency_avg`	`LatencyBreakdown`	Average latency.
`latency_p50`	`LatencyBreakdown`	Median (50th percentile) latency.
`latency_p95`	`LatencyBreakdown`	95th percentile latency.
`latency_p99`	`LatencyBreakdown`	99th percentile latency (cold-start outliers).
`provider_mode`	`string`	Voice mode used.
`stt_provider`	`string`	STT provider name.
`tts_provider`	`string`	TTS provider name.
`llm_provider`	`string`	LLM provider name.
`telephony_provider`	`string`	Telephony provider name.

TurnMetrics

Field	Type	Description
`turn_index`	`number`	Zero-based turn index.
`user_text`	`string`	What the user said.
`agent_text`	`string`	What the agent replied.
`latency`	`LatencyBreakdown`	Latency for this turn.
`stt_audio_seconds`	`number`	Audio duration processed by STT.
`tts_characters`	`number`	Characters synthesized by TTS.
`timestamp`	`number`	Unix timestamp.

Documentation Index

​Metrics & Cost Tracking

​How It Works

​Cost Breakdown

​Latency Breakdown

​Per-Turn Metrics

​Custom Pricing

​PricingUnit

​Model-Aware Pricing

​How resolution works

​Example A — Just select a model

​Example B — Override one model, keep siblings intact

​Example C — Register a brand-new model rate

​Default Pricing (2026.3)

​STT — per-model rates

​TTS — per-model rates

​OpenAI Realtime — per-model rates

​Real-Time Metrics

​Data Types

​CallMetrics

​TurnMetrics

Metrics & Cost Tracking

How It Works

Cost Breakdown

Latency Breakdown

Per-Turn Metrics

Custom Pricing

PricingUnit

Model-Aware Pricing

How resolution works

Example A — Just select a model

Example B — Override one model, keep siblings intact

Example C — Register a brand-new model rate

Default Pricing (2026.3)

STT — per-model rates

TTS — per-model rates

OpenAI Realtime — per-model rates

Real-Time Metrics

Data Types

CallMetrics

TurnMetrics