STT (Speech-to-Text)
STT is used in pipeline mode to transcribe caller audio before it reaches your LLM. If you use an engine such asOpenAIRealtime or ElevenLabsConvAI, speech recognition is handled internally by the engine.
Each STT ships as both a namespaced class (import * as deepgram from "getpatter/stt/deepgram" → new deepgram.STT()) and a flat alias (import { DeepgramSTT } from "getpatter"). They are equivalent — the flat aliases are convenient for short examples, the namespaced form avoids name collisions when mixing providers.
Quickstart
Supported providers
| Flat import | Namespaced import | Env var |
|---|---|---|
DeepgramSTT | getpatter/stt/deepgram → STT | DEEPGRAM_API_KEY |
WhisperSTT | getpatter/stt/whisper → STT | OPENAI_API_KEY |
CartesiaSTT | getpatter/stt/cartesia → STT | CARTESIA_API_KEY |
AssemblyAISTT | getpatter/stt/assemblyai → STT | ASSEMBLYAI_API_KEY |
SonioxSTT | getpatter/stt/soniox → STT | SONIOX_API_KEY |
Speechmatics is supported by the Python SDK but not yet by the TypeScript SDK — use the Python SDK if you need Speechmatics.
Deepgram
Streaming STT backed by Deepgram’snova-3 model.
| Parameter | Type | Default | Description |
|---|---|---|---|
apiKey | string | — | API key — reads from DEEPGRAM_API_KEY if omitted. |
language | string | "en" | BCP-47 language code. |
model | string | "nova-3" | Deepgram model ID. |
encoding | string | "linear16" | Audio encoding sent to Deepgram. |
sampleRate | number | 16000 | Sample rate in Hz. |
endpointingMs | number | 150 | Utterance endpointing in milliseconds. |
utteranceEndMs | number | null | 1000 | Grace period after speech ends. |
smartFormat | boolean | true | Smart formatting (numbers, dates, punctuation). |
interimResults | boolean | true | Stream interim transcripts. |
vadEvents | boolean | true | Emit VAD start/end markers. |
Whisper (OpenAI)
HTTP-based STT via OpenAI Whisper. ReusesOPENAI_API_KEY.
Cartesia
Streaming STT using Cartesia’sink-whisper. See Cartesia setup.
AssemblyAI
Universal Streaming STT via the AssemblyAI v3 WebSocket API. See AssemblyAI setup.Soniox
Real-time STT via Soniox.Missing credentials
Each class throws at construction time if no API key is resolved:What’s Next
LLM
Configure the language model.
TTS
Configure speech synthesis.

