Documentation Index
Fetch the complete documentation index at: https://docs.getpatter.com/llms.txt
Use this file to discover all available pages before exploring further.
Silero VAD
SileroVAD is Patter’s bundled VADProvider — voice activity detection backed by the Silero ONNX model. It buffers incoming PCM frames, runs inference on fixed-size windows (256 samples at 8 kHz, 512 at 16 kHz), applies an exponential probability filter, and emits speech_start / speech_end transitions.
Patter uses it to detect when the caller has started speaking so the agent can stop talking immediately (clean barge-in) and to gate STT activity on real speech instead of background noise.
Install
Silero VAD ships withonnxruntime-node as an optional peer dependency (~210 MB):
silero_vad.onnx model file is included with the package.
Patter is currently tested against
onnxruntime-node@~1.18.0. Versions 1.24+ removed listSupportedBackends and break the SDK; pin the version above until the SDK migration lands.Auto-loading
When you build a pipeline-mode agent and leavevad: undefined, Patter auto-loads SileroVAD.forPhoneCall() for you on the first call.
If onnxruntime-node is not installed, Patter logs a single warning and continues without VAD — barge-in latency is higher but the call still works.
To pick your own VAD or override the defaults, pass vad: explicitly. See the vad parameter on Agents (pipeline mode only).
Constructor
The recommended entrypoint is theforPhoneCall factory — it pins the sample rate to 16 kHz (what Patter’s pipeline-mode audio bus uses) and applies the upstream Silero defaults.
Phone-call preset (forPhoneCall)
Identical to load() but pins sampleRate to 16000 Hz — the only sample rate Patter’s pipeline-mode audio bus uses (8 kHz mulaw from Twilio is upsampled to 16 kHz PCM before reaching the VAD). All other parameters mirror the upstream Silero defaults from snakers4/silero-vad:
activationThreshold = 0.5— upstreamthresholddeactivationThreshold = 0.35— upstreamneg_threshold = threshold - 0.15minSpeechDuration = 0.25— upstreammin_speech_duration_ms = 250minSilenceDuration = 0.1— upstreammin_silence_duration_ms = 100prefixPaddingDuration = 0.03— upstreamspeech_pad_ms = 30
minSilenceDuration (e.g. 0.5–1.0 s):
Usage in a pipeline agent
When to use Silero VAD vs alternatives
- Use Silero VAD for any pipeline-mode agent that needs sub-300 ms barge-in. It’s the production default.
- Skip VAD (omit
vadand don’t install the peer dep) only when you’re prototyping locally on a system where the ONNX runtime is awkward to install. Barge-in falls back to a sustained-voice heuristic that is slower and noisier. - Realtime engines (
OpenAIRealtimeAdapter,GeminiLiveAdapter,UltravoxRealtimeAdapter) run server-side VAD inside the provider —agent.vadis ignored in engine mode.
Notes
- The model only supports 8000 or 16000 Hz inference. Other sample rates throw from
processFrame. - The resolver probes multiple paths to find
silero_vad.onnx, including under bundlers (Vite SSR, Next webpack, Bun). If you seemodel file not found, ensuregetpatteris fully installed in yournode_modules. numFramesRequired()returns the int16 sample count needed per inference window (256 @ 8 kHz, 512 @ 16 kHz). Smaller chunks are buffered safely.
What’s Next
Agents
The
vad parameter on phone.agent({...}).Krisp Filter
Proprietary noise / echo suppression.
DeepFilterNet
OSS noise suppression.
Pipeline mode
STT + LLM + TTS composition.

