STT (Speech-to-Text)
STT is used in pipeline mode to transcribe caller audio before it reaches your LLM. If you use an engine such asOpenAIRealtime or ElevenLabsConvAI, speech recognition is handled internally by the engine and you do not configure STT separately.
Each STT ships as both a namespaced class (from getpatter.stt import deepgram → deepgram.STT()) and a flat alias (from getpatter import DeepgramSTT). They are equivalent — pick whichever reads best. The flat aliases are convenient for short examples; the namespaced form avoids name collisions when you import several STTs together.
Quickstart
Supported providers
| Flat import | Namespaced import | Env var | Install extra |
|---|---|---|---|
DeepgramSTT | getpatter.stt.deepgram.STT | DEEPGRAM_API_KEY | included |
WhisperSTT | getpatter.stt.whisper.STT | OPENAI_API_KEY | included |
CartesiaSTT | getpatter.stt.cartesia.STT | CARTESIA_API_KEY | getpatter[cartesia] |
AssemblyAISTT | getpatter.stt.assemblyai.STT | ASSEMBLYAI_API_KEY | getpatter[assemblyai] |
SonioxSTT | getpatter.stt.soniox.STT | SONIOX_API_KEY | getpatter[soniox] |
SpeechmaticsSTT | getpatter.stt.speechmatics.STT | SPEECHMATICS_API_KEY | getpatter[speechmatics] |
Deepgram
Streaming STT backed by Deepgram’snova-3 model.
| Parameter | Type | Default | Description |
|---|---|---|---|
api_key | str | None | None | API key — reads from DEEPGRAM_API_KEY if omitted. |
language | str | "en" | BCP-47 language code. |
model | str | "nova-3" | Deepgram model ID. |
encoding | str | "linear16" | Audio encoding sent to Deepgram. |
sample_rate | int | 16000 | Sample rate in Hz. |
endpointing_ms | int | 150 | Utterance endpointing in milliseconds. |
utterance_end_ms | int | None | 1000 | Grace period after speech ends. |
smart_format | bool | True | Enable smart formatting (numbers, dates, punctuation). |
interim_results | bool | True | Stream interim transcripts. |
vad_events | bool | True | Emit VAD start/end markers. |
Whisper (OpenAI)
HTTP-based STT via OpenAI Whisper. ReusesOPENAI_API_KEY.
| Parameter | Type | Default | Description |
|---|---|---|---|
api_key | str | None | None | API key — reads from OPENAI_API_KEY if omitted. |
language | str | "en" | BCP-47 language code. |
model | str | "whisper-1" | Whisper model ID. |
Cartesia
Streaming STT using Cartesia’sink-whisper. See Cartesia setup.
AssemblyAI
Universal Streaming STT via the AssemblyAI v3 WebSocket API. See AssemblyAI setup.Soniox
Real-time STT via Soniox.Speechmatics
Real-time STT via Speechmatics (Python SDK only — not yet ported to TypeScript).Missing credentials
Each class raisesValueError at construction time if no API key is resolved from either api_key= or the matching env var:
What’s Next
LLM
Configure the language model.
TTS
Configure speech synthesis.

