feat(vad): bundle optimized silero vad and deprecate the plugin#5800
feat(vad): bundle optimized silero vad and deprecate the plugin#5800chenghao-mou wants to merge 5 commits into
Conversation
- Add the compiled silero vad from livekit-local-inference and deprecate the plugin; - Update examples
| *, | ||
| model: VADModels = "silero", | ||
| min_speech_duration: float = 0.05, | ||
| min_silence_duration: float = 0.55, |
There was a problem hiding this comment.
Should we follow the same defaults as Silero? IMO we should use 0.1 now. It shouldn't have any side effect and it is closer to the truth
There was a problem hiding this comment.
I'd love to, but I wasn't sure if you are going to merge #5788 first.
|
tested locally and it works well. could you add a comparison in the pr description? |
Deferred to a follow-up branch (chenghao/feat/default-vad). The auto-default flips several call-site behaviours under default args (VAD-based turn detection, adaptive interruption detection, hook `speaking=` payload type, `_last_speaking_time` source, AudioTurnDetector safety branches), and we want to land + audit those changes separately rather than couple them to the silero-plugin inlining. The `vad: NotGivenOr[vad.VAD | None]` type stays widened so users can opt out explicitly today and the default-vad branch can build on top. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| from .vad import VAD | ||
|
|
||
| vad_instance = VAD(model="silero") |
There was a problem hiding this comment.
🟡 Auto-created VAD for Speechmatics STT uses drastically lower min_silence_duration default (0.1s vs 0.55s)
In _resolve_vad_for_model, the auto-created VAD for Speechmatics STT models changed from SileroVAD.load() (which had min_silence_duration=0.55) to inference.VAD(model="silero") (which defaults to min_silence_duration=0.1). This 5.5× reduction means END_OF_SPEECH events fire much sooner for Speechmatics STT users who don't provide their own VAD. While the audio EOT detector (_maybe_apply_vad_silence_override in livekit-agents/livekit/agents/voice/audio_recognition.py:669) bumps this to at least 0.25s when the audio turn detector is active, users with text-based turn detectors (e.g., MultilingualModel) will experience the full 0.1s default — significantly different from the previous 0.55s behavior.
| from .vad import VAD | |
| vad_instance = VAD(model="silero") | |
| from .vad import VAD | |
| vad_instance = VAD(model="silero", min_silence_duration=0.55) |
Was this helpful? React with 👍 or 👎 to provide feedback.
…pectations - The bundled silero VAD's native singleton is preloaded in the forkserver via `livekit.agents.inference._warmup`, so the per-process prewarm pattern that stashed `inference.VAD(model="silero")` into `proc.userdata["vad"]` is now redundant. Inline the construction directly in `AgentSession(...)` and drop the empty/single-purpose `prewarm` functions. - Update `TestVadMinSilenceOverride` expectations: the required silence is `(MIN_SILENCE_DURATION_MS + 50) / 1000 = 0.25s` (was stale at 0.2s) and the log message check is "bumping vad min_silence_duration" (was stale). - Wire the missing `_stt_pipeline`, `_stt_consumer_atask`, `_interruption_atask`, `_turn_detection_atask`, and `_backchannel_boundary_timer` attributes into the synthetic `AudioRecognition` test fixture so `aclose()` doesn't trip over missing internal state. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Why
Silero VAD is the default endpointing implementation for voice agents, but lived behind a separate
livekit-plugins-sileroinstall step. That extra hop made the standard quickstart longer than it needed to be, and the plugin'sonnxruntime-based loader paid the full model load cost in every job process (no fork-time sharing).This PR moves Silero VAD into
livekit-agentscore, backed bylivekit-local-inference. The plugin stays installable as a deprecated shim until v2.0, and existing call sites continue to work — they transparently route to the new implementation when settings are compatible.This PR also introduces changes to follow the official silero settings, similar to #5788:
min_silence_durationfrom 0.55s to 0.1s.Code example
Before
After
No
prewarm_fnc, nosileroplugin import, noproc.userdatashuttle. Weights are loaded once in the forkserver and inherited by every job process via COW.API change
pip install livekit-agents livekit-plugins-sileropip install livekit-agents— Silero is bundledfrom livekit.plugins.silero import VADvad = VAD.load(min_silence_duration=0.4)from livekit.agents import inferencevad = inference.VAD(model="silero", min_silence_duration=0.4)silero.VAD.load()did a heavy onnxruntime session construction → expected to live behind aprewarmhookinference.VAD(model="silero")is a cheap wrapper; weights are loaded once at forkserver-preload time, inherited via COWVAD/VADStream/OnnxModel(~650 LOC)silero.VAD.load(force_cpu=False, sample_rate=16000)ran onnxruntime; with customonnx_file_path, used a user-supplied modelsilero.VAD.load(...)transparently delegates toinference.VAD(model="silero", ...)when settings are compatible; 8 kHz +onnx_file_pathstill routes to the legacy onnxruntime pathvad: NotGivenOr[vad.VAD] = NOT_GIVENinAgentSession.__init__—vad=Nonewas illegal per type, even though the code accepted itvad: NotGivenOr[vad.VAD | None] = NOT_GIVEN—vad=Nonenow type-legal as an explicit "no VAD" signalfrom livekit.agents.inference import VADBehaviour change
livekit.agents.inference._warmuponce →init_vad()+init_eot()page native weights into the forkserver. Jobs fork with weights already resident.import livekit.plugins.silero— silentDeprecationWarningpointing toinference.VAD(model="silero"); v2.0 removal targetsilero.VAD.load(force_cpu=False)honored the user's GPU request via onnxruntimeforce_cpu=Falseis ignored (native lib is CPU-only) → now emits aWARNINGexplaining the kwarg is ignored and pointing atonnx_file_pathas the legacy escape hatchsession.vad is NoneNone.Migration
silero.VAD.load()with default settingsfrom livekit.agents import inference; inference.VAD(model="silero")silero.VAD.load(sample_rate=8000)silero.VAD.load(onnx_file_path=...)silero.VAD.load(force_cpu=False)force_cpuis ignored on the delegated native pathinference.VAD(model="silero")(and accept CPU) or keep the plugin form withonnx_file_path=...to keep the legacy path