Skip to main content

STT (Speech-to-Text)

STT is used in pipeline mode to transcribe caller audio before it reaches your LLM. If you use an engine such as OpenAIRealtime or ElevenLabsConvAI, speech recognition is handled internally by the engine and you do not configure STT separately. Each STT ships as both a namespaced class (from getpatter.stt import deepgramdeepgram.STT()) and a flat alias (from getpatter import DeepgramSTT). They are equivalent — pick whichever reads best. The flat aliases are convenient for short examples; the namespaced form avoids name collisions when you import several STTs together.

Quickstart

import asyncio
from getpatter import Patter, Twilio, DeepgramSTT, ElevenLabsTTS

phone = Patter(carrier=Twilio(), phone_number="+15550001234")  # TWILIO_* from env

agent = phone.agent(
    stt=DeepgramSTT(endpointing_ms=80),   # DEEPGRAM_API_KEY from env
    tts=ElevenLabsTTS(voice="rachel"),     # ELEVENLABS_API_KEY from env
    system_prompt="You are a helpful assistant.",
    first_message="Hi!",
)

async def main():
    await phone.serve(agent)

asyncio.run(main())
The same agent using namespaced imports:
from getpatter.stt import deepgram
from getpatter.tts import elevenlabs

agent = phone.agent(
    stt=deepgram.STT(endpointing_ms=80),
    tts=elevenlabs.TTS(voice_id="rachel"),
    system_prompt="You are a helpful assistant.",
)

Supported providers

Flat importNamespaced importEnv varInstall extra
DeepgramSTTgetpatter.stt.deepgram.STTDEEPGRAM_API_KEYincluded
WhisperSTTgetpatter.stt.whisper.STTOPENAI_API_KEYincluded
CartesiaSTTgetpatter.stt.cartesia.STTCARTESIA_API_KEYgetpatter[cartesia]
AssemblyAISTTgetpatter.stt.assemblyai.STTASSEMBLYAI_API_KEYgetpatter[assemblyai]
SonioxSTTgetpatter.stt.soniox.STTSONIOX_API_KEYgetpatter[soniox]
SpeechmaticsSTTgetpatter.stt.speechmatics.STTSPEECHMATICS_API_KEYgetpatter[speechmatics]

Deepgram

Streaming STT backed by Deepgram’s nova-3 model.
from getpatter import DeepgramSTT

stt = DeepgramSTT()                                      # reads DEEPGRAM_API_KEY
stt = DeepgramSTT(api_key="dg_...", endpointing_ms=80)   # explicit
ParameterTypeDefaultDescription
api_keystr | NoneNoneAPI key — reads from DEEPGRAM_API_KEY if omitted.
languagestr"en"BCP-47 language code.
modelstr"nova-3"Deepgram model ID.
encodingstr"linear16"Audio encoding sent to Deepgram.
sample_rateint16000Sample rate in Hz.
endpointing_msint150Utterance endpointing in milliseconds.
utterance_end_msint | None1000Grace period after speech ends.
smart_formatboolTrueEnable smart formatting (numbers, dates, punctuation).
interim_resultsboolTrueStream interim transcripts.
vad_eventsboolTrueEmit VAD start/end markers.

Whisper (OpenAI)

HTTP-based STT via OpenAI Whisper. Reuses OPENAI_API_KEY.
from getpatter import WhisperSTT

stt = WhisperSTT()                           # reads OPENAI_API_KEY
stt = WhisperSTT(api_key="sk-...", language="es")
ParameterTypeDefaultDescription
api_keystr | NoneNoneAPI key — reads from OPENAI_API_KEY if omitted.
languagestr"en"BCP-47 language code.
modelstr"whisper-1"Whisper model ID.

Cartesia

Streaming STT using Cartesia’s ink-whisper. See Cartesia setup.
from getpatter import CartesiaSTT

stt = CartesiaSTT()                          # reads CARTESIA_API_KEY
stt = CartesiaSTT(api_key="csk_...", language="en", sample_rate=16000)

AssemblyAI

Universal Streaming STT via the AssemblyAI v3 WebSocket API. See AssemblyAI setup.
from getpatter import AssemblyAISTT

stt = AssemblyAISTT()                        # reads ASSEMBLYAI_API_KEY
stt = AssemblyAISTT(api_key="aa_...")

Soniox

Real-time STT via Soniox.
from getpatter import SonioxSTT

stt = SonioxSTT()                            # reads SONIOX_API_KEY

Speechmatics

Real-time STT via Speechmatics (Python SDK only — not yet ported to TypeScript).
from getpatter.stt import speechmatics

stt = speechmatics.STT()                     # reads SPEECHMATICS_API_KEY

Missing credentials

Each class raises ValueError at construction time if no API key is resolved from either api_key= or the matching env var:
ValueError: Deepgram STT requires an api_key. Pass api_key='dg_...' or
set DEEPGRAM_API_KEY in the environment.

What’s Next

LLM

Configure the language model.

TTS

Configure speech synthesis.