Documentation Index
Fetch the complete documentation index at: https://docs.getpatter.com/llms.txt
Use this file to discover all available pages before exploring further.
Silero VAD
SileroVAD is Patter’s bundled VADProvider — voice activity detection backed by the Silero ONNX model. It buffers incoming PCM frames, runs inference on fixed-size windows (256 samples at 8 kHz, 512 at 16 kHz), applies an exponential probability filter, and emits speech_start / speech_end transitions.
Patter uses it to detect when the caller has started speaking so the agent can stop talking immediately (clean barge-in) and to gate STT activity on real speech instead of background noise.
Install
Silero VAD ships as an optional extra (the ONNX runtime is ~210 MB):silero_vad.onnx model file is included with the package.
Auto-loading
When you build a pipeline-mode agent and leavevad=None (Python) / vad: undefined (TypeScript), Patter auto-loads SileroVAD.for_phone_call() / SileroVAD.forPhoneCall() for you on the first call.
If the optional extra is not installed, Patter logs a single warning and continues without VAD — barge-in latency is higher but the call still works.
To pick your own VAD or override the defaults, pass vad= explicitly. See the vad parameter on Agents (pipeline mode only).
Constructor
The recommended entrypoint is thefor_phone_call / forPhoneCall factory — it pins the sample rate to 16 kHz (what Patter’s pipeline-mode audio bus uses) and applies the upstream Silero defaults.
Phone-call preset (for_phone_call / forPhoneCall)
Identical to load() but pins sample_rate to 16000 Hz — the only sample rate Patter’s pipeline-mode audio bus uses (8 kHz mulaw from Twilio is upsampled to 16 kHz PCM before reaching the VAD). All other parameters mirror the upstream Silero defaults from snakers4/silero-vad:
activation_threshold = 0.5— upstreamthresholddeactivation_threshold = 0.35— upstreamneg_threshold = threshold - 0.15min_speech_duration = 0.25— upstreammin_speech_duration_ms = 250min_silence_duration = 0.1— upstreammin_silence_duration_ms = 100prefix_padding_duration = 0.03— upstreamspeech_pad_ms = 30
min_silence_duration (e.g. 0.5–1.0 s):
Usage in a pipeline agent
When to use Silero VAD vs alternatives
- Use Silero VAD for any pipeline-mode agent that needs sub-300 ms barge-in. It’s the production default.
- Skip VAD (pass
vad=Noneand don’t install the extra) only when you’re prototyping locally on a system where the ONNX runtime is awkward to install. Barge-in falls back to a sustained-voice heuristic that is slower and noisier. - Realtime engines (
OpenAIRealtime,GeminiLive,UltravoxRealtime) run server-side VAD inside the provider —agent.vadis ignored in engine mode.
Notes
- The model only supports 8000 or 16000 Hz inference. Other sample rates raise on
process_frame/processFrame. - Inference runs in a thread executor (Python) or asynchronously (TS) so the event loop stays responsive. Patter logs a warning if a single window takes more than 200 ms.
- The TS resolver probes multiple paths to find
silero_vad.onnx, including under bundlers (Vite SSR, Next webpack, Bun). If you seemodel file not found, install withnpm install onnxruntime-node@~1.18.0and ensuregetpatteris fully installed in yournode_modules.
What’s Next
Agents
The
vad parameter on phone.agent(...).Krisp Filter
Proprietary noise / echo suppression.
DeepFilterNet
OSS noise suppression.
Pipeline mode
STT + LLM + TTS composition.

