Skip to main content

STT (Speech-to-Text)

STT is used in pipeline mode to transcribe caller audio before it reaches your LLM. If you use an engine such as OpenAIRealtime or ElevenLabsConvAI, speech recognition is handled internally by the engine. Each STT ships as both a namespaced class (import * as deepgram from "getpatter/stt/deepgram"new deepgram.STT()) and a flat alias (import { DeepgramSTT } from "getpatter"). They are equivalent — the flat aliases are convenient for short examples, the namespaced form avoids name collisions when mixing providers.

Quickstart

// npx tsx example.ts
import { Patter, Twilio, DeepgramSTT, ElevenLabsTTS } from "getpatter";

const phone = new Patter({ carrier: new Twilio(), phoneNumber: "+15550001234" });  // TWILIO_* from env

const agent = phone.agent({
  stt: new DeepgramSTT({ endpointingMs: 80 }),      // DEEPGRAM_API_KEY from env
  tts: new ElevenLabsTTS({ voiceId: "rachel" }),    // ELEVENLABS_API_KEY from env
  systemPrompt: "You are a helpful assistant.",
  firstMessage: "Hi!",
});

await phone.serve({ agent });
The same agent using namespaced imports:
import * as deepgram from "getpatter/stt/deepgram";
import * as elevenlabs from "getpatter/tts/elevenlabs";

const agent = phone.agent({
  stt: new deepgram.STT({ endpointingMs: 80 }),
  tts: new elevenlabs.TTS({ voiceId: "rachel" }),
  systemPrompt: "You are a helpful assistant.",
});

Supported providers

Flat importNamespaced importEnv var
DeepgramSTTgetpatter/stt/deepgramSTTDEEPGRAM_API_KEY
WhisperSTTgetpatter/stt/whisperSTTOPENAI_API_KEY
CartesiaSTTgetpatter/stt/cartesiaSTTCARTESIA_API_KEY
AssemblyAISTTgetpatter/stt/assemblyaiSTTASSEMBLYAI_API_KEY
SonioxSTTgetpatter/stt/sonioxSTTSONIOX_API_KEY
Speechmatics is supported by the Python SDK but not yet by the TypeScript SDK — use the Python SDK if you need Speechmatics.

Deepgram

Streaming STT backed by Deepgram’s nova-3 model.
import { DeepgramSTT } from "getpatter";

const stt = new DeepgramSTT();                                    // reads DEEPGRAM_API_KEY
const stt = new DeepgramSTT({ apiKey: "dg_...", endpointingMs: 80 });
ParameterTypeDefaultDescription
apiKeystringAPI key — reads from DEEPGRAM_API_KEY if omitted.
languagestring"en"BCP-47 language code.
modelstring"nova-3"Deepgram model ID.
encodingstring"linear16"Audio encoding sent to Deepgram.
sampleRatenumber16000Sample rate in Hz.
endpointingMsnumber150Utterance endpointing in milliseconds.
utteranceEndMsnumber | null1000Grace period after speech ends.
smartFormatbooleantrueSmart formatting (numbers, dates, punctuation).
interimResultsbooleantrueStream interim transcripts.
vadEventsbooleantrueEmit VAD start/end markers.

Whisper (OpenAI)

HTTP-based STT via OpenAI Whisper. Reuses OPENAI_API_KEY.
import { WhisperSTT } from "getpatter";

const stt = new WhisperSTT();                                     // reads OPENAI_API_KEY
const stt = new WhisperSTT({ apiKey: "sk-...", language: "es" });

Cartesia

Streaming STT using Cartesia’s ink-whisper. See Cartesia setup.
import { CartesiaSTT } from "getpatter";

const stt = new CartesiaSTT();                                    // reads CARTESIA_API_KEY
const stt = new CartesiaSTT({ apiKey: "csk_...", language: "en" });

AssemblyAI

Universal Streaming STT via the AssemblyAI v3 WebSocket API. See AssemblyAI setup.
import { AssemblyAISTT } from "getpatter";

const stt = new AssemblyAISTT();                                  // reads ASSEMBLYAI_API_KEY

Soniox

Real-time STT via Soniox.
import { SonioxSTT } from "getpatter";

const stt = new SonioxSTT();                                      // reads SONIOX_API_KEY

Missing credentials

Each class throws at construction time if no API key is resolved:
Error: Deepgram STT requires an apiKey. Pass { apiKey: 'dg_...' } or
set DEEPGRAM_API_KEY in the environment.

What’s Next

LLM

Configure the language model.

TTS

Configure speech synthesis.