Skip to main content

Cartesia STT

Streaming speech-to-text using Cartesia’s ink-whisper model. Ported from LiveKit Agents (Apache 2.0) — pure-aiohttp transport, no vendor SDK required.

Quickstart

from patter.providers.cartesia_stt import CartesiaSTT

stt = CartesiaSTT(api_key="csk_...", language="en")
await stt.connect()
await stt.send_audio(pcm_chunk)  # 16 kHz PCM s16le
async for t in stt.receive_transcripts():
    print(t.text, t.is_final, t.confidence)
await stt.close()
Install the extra: pip install getpatter[cartesia]. Supported sample rates: 8000, 16000, 24000, 44100, 48000 Hz.

Cartesia TTS

CartesiaTTS is a Patter TTSProvider backed by Cartesia’s bytes endpoint. It streams raw PCM_S16LE chunks that drop directly into Patter’s pipeline with no transcoding.

Install

pip install "patter[cartesia]"

Usage

from patter.providers.cartesia_tts import CartesiaTTS

tts = CartesiaTTS(
    # Falls back to CARTESIA_API_KEY env var if api_key is None.
    api_key="...",
    model="sonic-2",
    voice="f786b574-daa5-4673-aa0c-cbe3e8534c02",  # Katie
    language="en",
    sample_rate=16000,
)

async for chunk in tts.synthesize("Hello from Patter."):
    # chunk is raw PCM_S16LE at sample_rate
    ...

await tts.close()

Options

OptionDefaultNotes
model"sonic-2"Any Cartesia TTS model id (e.g. "sonic-3").
voice"f786b574-..."Cartesia voice id.
language"en"ISO 639-1 code.
sample_rate16000Hz.
speedNone"fastest" ... "slowest" or float in [0.6, 2.0].
emotionNoneSee Cartesia’s emotion list.
volumeNoneFloat in [0.5, 2.0] for sonic-3.

Attribution

Ported from LiveKit Agents (livekit-plugins-cartesia, Apache License 2.0).