Whisper STT

WhisperSTT is a buffered HTTP transcription adapter for OpenAI’s POST /v1/audio/transcriptions endpoint. It buffers ~1 s of incoming PCM audio (16 kHz, 16-bit mono), wraps it as a WAV blob, and submits it to Whisper for transcription. Drop-in compatible with the streaming STTProvider interface so it can be swapped for Deepgram / Soniox / Speechmatics without changes to the calling code. For ~10x lower latency see the GPT-4o transcribe family below — it’s a strict subclass that hits the same endpoint with gpt-4o-transcribe / gpt-4o-mini-transcribe.

Install

whisper ships in the base install — no extra needed.

pip install getpatter

Usage

getpatter.stt.whisper.STT() and getpatter.stt.openai_transcribe.STT() both auto-resolve OPENAI_API_KEY from the environment when api_key= is omitted.

# Whisper-1 (REST)
from getpatter.stt import whisper

stt = whisper.STT()                                       # reads OPENAI_API_KEY
stt = whisper.STT(api_key="sk-...", language="it")

# GPT-4o transcribe family — ~10x faster than whisper-1
from getpatter.stt import openai_transcribe

stt = openai_transcribe.STT(model="gpt-4o-transcribe")    # default
stt = openai_transcribe.STT(model="gpt-4o-mini-transcribe")

Plug it into an agent:

import asyncio
from getpatter import Patter, Twilio, ElevenLabsTTS
from getpatter.stt import openai_transcribe

phone = Patter(carrier=Twilio(), phone_number="+15550001234")

agent = phone.agent(
    stt=openai_transcribe.STT(),                          # OPENAI_API_KEY from env
    tts=ElevenLabsTTS(voice_id="rachel"),
    system_prompt="You are a helpful assistant.",
)

asyncio.run(phone.serve(agent))

Models and rates

Per minute of audio (defaults from getpatter.pricing):

Provider key	Model	Rate / min
`whisper`	`whisper-1` (default)	$0.006
`whisper`	`gpt-4o-transcribe`	$0.006
`whisper`	`gpt-4o-mini-transcribe`	$0.003
`openai_transcribe`	`gpt-4o-transcribe` (default)	$0.006
`openai_transcribe`	`gpt-4o-mini-transcribe`	$0.003
`openai_transcribe`	`whisper-1`	$0.006

The two provider keys hit the same endpoint but are tracked separately in the dashboard so cost attribution stays clean.

Languages

language="en" by default. Whisper-1 and the GPT-4o transcribe family auto-detect the spoken language but accept an explicit BCP-47 hint (e.g. "it", "fr", "es", "de", "pt", "ja", "zh") for higher accuracy on short utterances. See the OpenAI language coverage list.

Options

Option	Default	Notes
`api_key`	`None`	Reads from `OPENAI_API_KEY` when omitted.
`model`	`"whisper-1"` (Whisper) / `"gpt-4o-transcribe"` (Transcribe)	Restricted to the family’s allowed model set; misconfigured calls raise `ValueError`.
`language`	`"en"`	BCP-47 code.
`response_format`	`"json"`	Pass `"verbose_json"` to surface per-segment confidence and timestamps.

Low-level usage

from getpatter.providers.whisper_stt import WhisperSTT
from getpatter.providers.openai_transcribe_stt import OpenAITranscribeSTT

stt = WhisperSTT(api_key="sk-...", model="whisper-1", language="en")
fast = OpenAITranscribeSTT(api_key="sk-...", model="gpt-4o-mini-transcribe")

await stt.connect()
await stt.send_audio(pcm_chunk)                           # 16 kHz PCM s16le
async for t in stt.receive_transcripts():
    print(t.text, t.is_final, t.confidence)
await stt.close()

Get Started

Setting up Patter

Observability

Integrations

Development

Whisper STT (OpenAI)

Whisper STT

Install

Usage

Models and rates

Languages

Options

Low-level usage

Get Started

Setting up Patter

Observability

Integrations

Development

Documentation Index

​Whisper STT

​Install

​Usage

​Models and rates

​Languages

​Options

​Low-level usage

Whisper STT

Install

Usage

Models and rates

Languages

Options

Low-level usage