Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.getpatter.com/llms.txt

Use this file to discover all available pages before exploring further.

Silero VAD

SileroVAD is Patter’s bundled VADProvider — voice activity detection backed by the Silero ONNX model. It buffers incoming PCM frames, runs inference on fixed-size windows (256 samples at 8 kHz, 512 at 16 kHz), applies an exponential probability filter, and emits speech_start / speech_end transitions. Patter uses it to detect when the caller has started speaking so the agent can stop talking immediately (clean barge-in) and to gate STT activity on real speech instead of background noise.

Install

Silero VAD ships as an optional extra (the ONNX runtime is ~210 MB):
pip install "getpatter[silero]"
The bundled silero_vad.onnx model file is included with the package.

Auto-loading

When you build a pipeline-mode agent and leave vad=None (Python) / vad: undefined (TypeScript), Patter auto-loads SileroVAD.for_phone_call() / SileroVAD.forPhoneCall() for you on the first call. If the optional extra is not installed, Patter logs a single warning and continues without VAD — barge-in latency is higher but the call still works. To pick your own VAD or override the defaults, pass vad= explicitly. See the vad parameter on Agents (pipeline mode only).

Constructor

The recommended entrypoint is the for_phone_call / forPhoneCall factory — it pins the sample rate to 16 kHz (what Patter’s pipeline-mode audio bus uses) and applies the upstream Silero defaults.
import asyncio
from getpatter.providers.silero_vad import SileroVAD

# Recommended for telephony pipelines
vad = await asyncio.to_thread(SileroVAD.for_phone_call)

# Or, full control:
vad = SileroVAD.load(
    activation_threshold=0.5,        # Silero `threshold`
    deactivation_threshold=0.35,     # `neg_threshold = threshold - 0.15`
    min_speech_duration=0.25,        # seconds, `min_speech_duration_ms = 250`
    min_silence_duration=0.1,        # seconds, `min_silence_duration_ms = 100`
    prefix_padding_duration=0.03,    # seconds, `speech_pad_ms = 30`
    sample_rate=16000,               # 8000 or 16000 only
    force_cpu=True,
)

Phone-call preset (for_phone_call / forPhoneCall)

Identical to load() but pins sample_rate to 16000 Hz — the only sample rate Patter’s pipeline-mode audio bus uses (8 kHz mulaw from Twilio is upsampled to 16 kHz PCM before reaching the VAD). All other parameters mirror the upstream Silero defaults from snakers4/silero-vad:
  • activation_threshold = 0.5 — upstream threshold
  • deactivation_threshold = 0.35 — upstream neg_threshold = threshold - 0.15
  • min_speech_duration = 0.25 — upstream min_speech_duration_ms = 250
  • min_silence_duration = 0.1 — upstream min_silence_duration_ms = 100
  • prefix_padding_duration = 0.03 — upstream speech_pad_ms = 30
Override any field via keyword arguments. Deployments that experience truncation on natural pauses can raise min_silence_duration (e.g. 0.5–1.0 s):
vad = await asyncio.to_thread(
    SileroVAD.for_phone_call,
    min_silence_duration=0.5,
)

Usage in a pipeline agent

import asyncio
from getpatter import Patter, Twilio, DeepgramSTT, AnthropicLLM, ElevenLabsTTS
from getpatter.providers.silero_vad import SileroVAD

phone = Patter(carrier=Twilio(), phone_number="+15550001234")

vad = await asyncio.to_thread(SileroVAD.for_phone_call)

agent = phone.agent(
    stt=DeepgramSTT(),
    llm=AnthropicLLM(),
    tts=ElevenLabsTTS(voice_id="rachel"),
    vad=vad,                                # explicit, or omit to auto-load
    system_prompt="You are a helpful assistant.",
)

await phone.serve(agent)

When to use Silero VAD vs alternatives

  • Use Silero VAD for any pipeline-mode agent that needs sub-300 ms barge-in. It’s the production default.
  • Skip VAD (pass vad=None and don’t install the extra) only when you’re prototyping locally on a system where the ONNX runtime is awkward to install. Barge-in falls back to a sustained-voice heuristic that is slower and noisier.
  • Realtime engines (OpenAIRealtime, GeminiLive, UltravoxRealtime) run server-side VAD inside the provider — agent.vad is ignored in engine mode.

Notes

  • The model only supports 8000 or 16000 Hz inference. Other sample rates raise on process_frame / processFrame.
  • Inference runs in a thread executor (Python) or asynchronously (TS) so the event loop stays responsive. Patter logs a warning if a single window takes more than 200 ms.
  • The TS resolver probes multiple paths to find silero_vad.onnx, including under bundlers (Vite SSR, Next webpack, Bun). If you see model file not found, install with npm install onnxruntime-node@~1.18.0 and ensure getpatter is fully installed in your node_modules.

What’s Next

Agents

The vad parameter on phone.agent(...).

Krisp Filter

Proprietary noise / echo suppression.

DeepFilterNet

OSS noise suppression.

Pipeline mode

STT + LLM + TTS composition.