Skip to main content

Providers

Patter supports three provider modes that control how audio is processed and AI responses are generated.

OpenAI Realtime (Default)

The default provider uses OpenAI’s Realtime API for speech-to-speech processing. Audio streams directly between the phone and OpenAI with minimal latency.
const agent = phone.agent({
  systemPrompt: "You are a helpful assistant.",
  provider: "openai_realtime",
  model: "gpt-4o-mini-realtime-preview",
  voice: "alloy",
});
Supported voices: alloy, ash, ballad, coral, echo, sage, shimmer, verse
OpenAI Realtime is a speech-to-speech model. It handles STT, reasoning, and TTS in a single API call for the lowest latency.

ElevenLabs Conversational AI

Uses ElevenLabs’ Conversational AI platform. Requires an ElevenLabs agent created in their dashboard.
const agent = phone.agent({
  systemPrompt: "You are a helpful assistant.",
  provider: "elevenlabs_convai",
  elevenlabsKey: process.env.ELEVENLABS_KEY,
  elevenlabsAgentId: process.env.ELEVENLABS_AGENT_ID,
  voice: "21m00Tcm4TlvDq8ikWAM",
  language: "en",
});
ParameterDescription
elevenlabsKeyYour ElevenLabs API key.
elevenlabsAgentIdThe agent ID from the ElevenLabs dashboard.
voiceElevenLabs voice ID.

Pipeline Mode

Pipeline mode gives you full control over the STT, LLM, and TTS stages independently. Use it when you want to mix providers or add custom processing between stages.
const agent = phone.agent({
  systemPrompt: "You are a helpful assistant.",
  provider: "pipeline",
  stt: Patter.deepgram({ apiKey: process.env.DEEPGRAM_KEY! }),
  tts: Patter.elevenlabs({ apiKey: process.env.ELEVENLABS_KEY!, voice: "rachel" }),
});

await phone.serve({
  agent,
  onMessage: async (data) => {
    // Custom LLM logic — return the text to be spoken
    const transcript = data.text as string;
    const response = await myCustomLLM(transcript);
    return response;
  },
});
In pipeline mode, the onMessage callback receives the user’s transcript and must return the text for TTS.

STT Factory Functions

Create STT configurations using static factory methods:

Patter.deepgram()

const stt = Patter.deepgram({
  apiKey: process.env.DEEPGRAM_KEY!,
  language: "en", // optional, defaults to "en"
});

Patter.whisper()

const stt = Patter.whisper({
  apiKey: process.env.OPENAI_KEY!,
  language: "en", // optional, defaults to "en"
});

TTS Factory Functions

Create TTS configurations using static factory methods:

Patter.elevenlabs()

const tts = Patter.elevenlabs({
  apiKey: process.env.ELEVENLABS_KEY!,
  voice: "rachel", // optional, defaults to "rachel"
});

Patter.openaiTts()

const tts = Patter.openaiTts({
  apiKey: process.env.OPENAI_KEY!,
  voice: "alloy", // optional, defaults to "alloy"
});
OpenAI TTS returns 24kHz PCM audio, which the SDK automatically resamples to 16kHz for telephony.

Provider Comparison

FeatureOpenAI RealtimeElevenLabs ConvAIPipeline
LatencyLowestLowVariable
Voice qualityHighVery highDepends on TTS
Custom LLMNoNoYes
Function callingYesLimitedVia onMessage
LanguagesMultiMultiDepends on STT/TTS