Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.getpatter.com/llms.txt

Use this file to discover all available pages before exploring further.

Whisper STT

WhisperSTT is a buffered HTTP transcription adapter for OpenAI’s POST /v1/audio/transcriptions endpoint. It buffers ~1 s of incoming PCM audio (16 kHz, 16-bit mono), wraps it as a WAV blob, and submits it to Whisper for transcription. Drop-in compatible with the streaming STTProvider interface so it can be swapped for Deepgram / Soniox / Speechmatics without changes to the calling code. For ~10x lower latency see the GPT-4o transcribe family below — it’s a strict subclass that hits the same endpoint with gpt-4o-transcribe / gpt-4o-mini-transcribe.

Install

whisper ships in the base install — no extra needed.
pip install getpatter

Usage

getpatter.stt.whisper.STT() and getpatter.stt.openai_transcribe.STT() both auto-resolve OPENAI_API_KEY from the environment when api_key= is omitted.
# Whisper-1 (REST)
from getpatter.stt import whisper

stt = whisper.STT()                                       # reads OPENAI_API_KEY
stt = whisper.STT(api_key="sk-...", language="it")

# GPT-4o transcribe family — ~10x faster than whisper-1
from getpatter.stt import openai_transcribe

stt = openai_transcribe.STT(model="gpt-4o-transcribe")    # default
stt = openai_transcribe.STT(model="gpt-4o-mini-transcribe")
Plug it into an agent:
import asyncio
from getpatter import Patter, Twilio, ElevenLabsTTS
from getpatter.stt import openai_transcribe

phone = Patter(carrier=Twilio(), phone_number="+15550001234")

agent = phone.agent(
    stt=openai_transcribe.STT(),                          # OPENAI_API_KEY from env
    tts=ElevenLabsTTS(voice_id="rachel"),
    system_prompt="You are a helpful assistant.",
)

asyncio.run(phone.serve(agent))

Models and rates

Per minute of audio (defaults from getpatter.pricing):
Provider keyModelRate / min
whisperwhisper-1 (default)$0.006
whispergpt-4o-transcribe$0.006
whispergpt-4o-mini-transcribe$0.003
openai_transcribegpt-4o-transcribe (default)$0.006
openai_transcribegpt-4o-mini-transcribe$0.003
openai_transcribewhisper-1$0.006
The two provider keys hit the same endpoint but are tracked separately in the dashboard so cost attribution stays clean.

Languages

language="en" by default. Whisper-1 and the GPT-4o transcribe family auto-detect the spoken language but accept an explicit BCP-47 hint (e.g. "it", "fr", "es", "de", "pt", "ja", "zh") for higher accuracy on short utterances. See the OpenAI language coverage list.

Options

OptionDefaultNotes
api_keyNoneReads from OPENAI_API_KEY when omitted.
model"whisper-1" (Whisper) / "gpt-4o-transcribe" (Transcribe)Restricted to the family’s allowed model set; misconfigured calls raise ValueError.
language"en"BCP-47 code.
response_format"json"Pass "verbose_json" to surface per-segment confidence and timestamps.

Low-level usage

from getpatter.providers.whisper_stt import WhisperSTT
from getpatter.providers.openai_transcribe_stt import OpenAITranscribeSTT

stt = WhisperSTT(api_key="sk-...", model="whisper-1", language="en")
fast = OpenAITranscribeSTT(api_key="sk-...", model="gpt-4o-mini-transcribe")

await stt.connect()
await stt.send_audio(pcm_chunk)                           # 16 kHz PCM s16le
async for t in stt.receive_transcripts():
    print(t.text, t.is_final, t.confidence)
await stt.close()