STT is used in pipeline mode to transcribe caller audio before it reaches your LLM. If you use an engine such as OpenAIRealtime or ElevenLabsConvAI, speech recognition is handled internally by the engine and you do not configure STT separately.Each STT ships as both a namespaced class (from getpatter.stt import deepgram → deepgram.STT()) and a flat alias (from getpatter import DeepgramSTT). They are equivalent — pick whichever reads best. The flat aliases are convenient for short examples; the namespaced form avoids name collisions when you import several STTs together.
Each provider exports a typed StrEnum of valid model IDs alongside the provider class. They keep model= arguments tab-completable and reject typos at construction time, while still accepting raw strings for forward compatibility:
Smart formatting (numbers, dates, punctuation). Defaults to False because telephony agents feed transcripts straight back into an LLM, where smart-format rewrites can confuse downstream tool-call argument parsing. Pass smart_format=True to opt back in.
Whisper on mulaw 8 kHz routinely hallucinates short fillers ("you", ".", "thank you") and emits is_final=true on every chunk regardless of speech. The pipeline drops these by default plus duplicate / sub-500 ms back-to-back finals, but for production prefer OpenAITranscribeSTT (gpt-4o-transcribe) — same OPENAI_API_KEY, ~10× faster, no hallucination floor.
First-class STT for OpenAI’s gpt-4o-transcribe and gpt-4o-mini-transcribe models — drop-in replacement for WhisperSTT with stronger multilingual quality and significantly lower latency. Reuses OPENAI_API_KEY.
from getpatter import OpenAITranscribeSTTstt = OpenAITranscribeSTT() # reads OPENAI_API_KEY, defaults to gpt-4o-transcribestt = OpenAITranscribeSTT(model="gpt-4o-mini-transcribe") # cheaper variantstt = OpenAITranscribeSTT(api_key="sk-...", language="es")
Parameter
Type
Default
Description
api_key
str | None
None
API key — reads from OPENAI_API_KEY if omitted.
language
str | None
None
BCP-47 language code. Auto-detect when omitted.
model
str
"gpt-4o-transcribe"
Either "gpt-4o-transcribe" or "gpt-4o-mini-transcribe".
response_format
str
"json"
Pass "verbose_json" to expose segment-level confidence and timestamps.