Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.getpatter.com/llms.txt

Use this file to discover all available pages before exploring further.

Whisper STT

WhisperSTT is a buffered HTTP transcription adapter for OpenAI’s POST /v1/audio/transcriptions endpoint. It buffers ~1 s of incoming PCM audio (16 kHz, 16-bit mono), wraps it as a WAV blob, and submits it to Whisper for transcription. Drop-in compatible with the streaming STTProvider interface so it can be swapped for Deepgram / Soniox without changes to the calling code. For ~10x lower latency see the GPT-4o transcribe family below — it’s a strict subclass that hits the same endpoint with gpt-4o-transcribe / gpt-4o-mini-transcribe.

Install

whisper ships in the base install.
npm install getpatter

Usage

getpatter/stt/whisper and getpatter/stt/openai-transcribe both auto-resolve OPENAI_API_KEY from the environment when apiKey is omitted.
// Whisper-1 (REST)
import * as whisper from "getpatter/stt/whisper";

const stt = new whisper.STT();                                    // reads OPENAI_API_KEY
const stt2 = new whisper.STT({ apiKey: "sk-...", language: "it" });

// GPT-4o transcribe family — ~10x faster than whisper-1
import * as openaiTranscribe from "getpatter/stt/openai-transcribe";

const fast = new openaiTranscribe.STT({ model: "gpt-4o-transcribe" });   // default
const mini = new openaiTranscribe.STT({ model: "gpt-4o-mini-transcribe" });
Plug it into an agent:
// npx tsx example.ts
import { Patter, Twilio, ElevenLabsTTS } from "getpatter";
import * as openaiTranscribe from "getpatter/stt/openai-transcribe";

const phone = new Patter({ carrier: new Twilio(), phoneNumber: "+15550001234" });

const agent = phone.agent({
  stt: new openaiTranscribe.STT(),                    // OPENAI_API_KEY from env
  tts: new ElevenLabsTTS({ voiceId: "rachel" }),
  systemPrompt: "You are a helpful assistant.",
});

await phone.serve({ agent });

Models and rates

Per minute of audio (defaults from getpatter/pricing):
Provider keyModelRate / min
whisperwhisper-1 (default)$0.006
whispergpt-4o-transcribe$0.006
whispergpt-4o-mini-transcribe$0.003
openai_transcribegpt-4o-transcribe (default)$0.006
openai_transcribegpt-4o-mini-transcribe$0.003
openai_transcribewhisper-1$0.006
The two provider keys hit the same endpoint but are tracked separately in the dashboard so cost attribution stays clean.

Languages

language: "en" by default. Whisper-1 and the GPT-4o transcribe family auto-detect the spoken language but accept an explicit BCP-47 hint (e.g. "it", "fr", "es", "de", "pt", "ja", "zh") for higher accuracy on short utterances. See the OpenAI language coverage list.

Options

OptionDefaultNotes
apiKeyReads from OPENAI_API_KEY when omitted.
model"whisper-1" (Whisper) / "gpt-4o-transcribe" (Transcribe)Restricted to the family’s allowed model set; misconfigured calls throw.
language"en"BCP-47 code.
bufferSize~1 s of 16 kHz PCMBytes buffered before each transcription request.
responseFormat"json"Pass "verbose_json" to surface per-segment confidence and timestamps.