Skip to content

feat(vad): bundle optimized silero vad and deprecate the plugin#5800

Open
chenghao-mou wants to merge 5 commits into
feat/AGT-2520-multimodal-EOUfrom
chenghao/feat/inline-silero-vad
Open

feat(vad): bundle optimized silero vad and deprecate the plugin#5800
chenghao-mou wants to merge 5 commits into
feat/AGT-2520-multimodal-EOUfrom
chenghao/feat/inline-silero-vad

Conversation

@chenghao-mou
Copy link
Copy Markdown
Member

@chenghao-mou chenghao-mou commented May 21, 2026

Why

Silero VAD is the default endpointing implementation for voice agents, but lived behind a separate livekit-plugins-silero install step. That extra hop made the standard quickstart longer than it needed to be, and the plugin's onnxruntime-based loader paid the full model load cost in every job process (no fork-time sharing).

This PR moves Silero VAD into livekit-agents core, backed by livekit-local-inference. The plugin stays installable as a deprecated shim until v2.0, and existing call sites continue to work — they transparently route to the new implementation when settings are compatible.

This PR also introduces changes to follow the official silero settings, similar to #5788:

  • removed exp filter
  • changed the default min_silence_duration from 0.55s to 0.1s.

Code example

Before

from livekit.agents import Agent, AgentSession, JobContext, JobProcess, WorkerOptions, cli
from livekit.plugins import deepgram, openai, silero


def prewarm(proc: JobProcess) -> None:
    # Heavy ONNX session construction — must live behind prewarm so each
    # job process doesn't pay the load on every conversation start.
    proc.userdata["vad"] = silero.VAD.load()


async def entrypoint(ctx: JobContext) -> None:
    session = AgentSession(
        vad=ctx.proc.userdata["vad"],
        stt=deepgram.STT(),
        llm=openai.LLM(),
        tts=openai.TTS(),
    )
    await session.start(agent=Agent(instructions="..."), room=ctx.room)


if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint, prewarm_fnc=prewarm))

After

from livekit.agents import Agent, AgentSession, JobContext, WorkerOptions, cli, inference
from livekit.plugins import deepgram, openai


async def entrypoint(ctx: JobContext) -> None:
    session = AgentSession(
        vad=inference.VAD(model="silero"),
        stt=deepgram.STT(),
        llm=openai.LLM(),
        tts=openai.TTS(),
    )
    await session.start(agent=Agent(instructions="..."), room=ctx.room)


if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

No prewarm_fnc, no silero plugin import, no proc.userdata shuttle. Weights are loaded once in the forkserver and inherited by every job process via COW.

API change

Before After
pip install livekit-agents livekit-plugins-silero pip install livekit-agents — Silero is bundled
from livekit.plugins.silero import VAD
vad = VAD.load(min_silence_duration=0.4)
from livekit.agents import inference
vad = inference.VAD(model="silero", min_silence_duration=0.4)
silero.VAD.load() did a heavy onnxruntime session construction → expected to live behind a prewarm hook inference.VAD(model="silero") is a cheap wrapper; weights are loaded once at forkserver-preload time, inherited via COW
Per-job: ~6 MB Silero ONNX loaded into every job process Per-fork: weights resident in the forkserver, COW-shared with each job (Linux); spawn platforms unchanged
Plugin owned its own VAD/VADStream/OnnxModel (~650 LOC) Core owns the wrapper; plugin keeps a frozen copy as a deprecation shim
silero.VAD.load(force_cpu=False, sample_rate=16000) ran onnxruntime; with custom onnx_file_path, used a user-supplied model silero.VAD.load(...) transparently delegates to inference.VAD(model="silero", ...) when settings are compatible; 8 kHz + onnx_file_path still routes to the legacy onnxruntime path
vad: NotGivenOr[vad.VAD] = NOT_GIVEN in AgentSession.__init__vad=None was illegal per type, even though the code accepted it vad: NotGivenOr[vad.VAD | None] = NOT_GIVENvad=None now type-legal as an explicit "no VAD" signal
No way to invoke Silero VAD without importing the plugin from livekit.agents.inference import VAD

Behaviour change

Before After
Worker startup: each forked job pays the full Silero ONNX load (~tens of ms) Forkserver preload runs livekit.agents.inference._warmup once → init_vad() + init_eot() page native weights into the forkserver. Jobs fork with weights already resident.
import livekit.plugins.silero — silent Emits a single DeprecationWarning pointing to inference.VAD(model="silero"); v2.0 removal target
silero.VAD.load(force_cpu=False) honored the user's GPU request via onnxruntime When delegating to native, force_cpu=False is ignored (native lib is CPU-only) → now emits a WARNING explaining the kwarg is ignored and pointing at onnx_file_path as the legacy escape hatch
AgentSession with default args → session.vad is None No change on this branch — default stays None.

Migration

If you currently use… Do nothing? Recommended update
silero.VAD.load() with default settings Works (delegates) — deprecation warning prints once at plugin import from livekit.agents import inference; inference.VAD(model="silero")
silero.VAD.load(sample_rate=8000) Works — still routes to legacy onnxruntime No native equivalent; stay on plugin until v2.0 then migrate if 16 kHz is acceptable
silero.VAD.load(onnx_file_path=...) Works — still routes to legacy onnxruntime Stay on plugin until v2.0; the bundled native model is fixed
silero.VAD.load(force_cpu=False) Works but a warning now fires noting force_cpu is ignored on the delegated native path Use inference.VAD(model="silero") (and accept CPU) or keep the plugin form with onnx_file_path=... to keep the legacy path

- Add the compiled silero vad from livekit-local-inference and deprecate the plugin;
- Update examples
devin-ai-integration[bot]

This comment was marked as resolved.

Comment thread livekit-agents/livekit/agents/worker.py Outdated
*,
model: VADModels = "silero",
min_speech_duration: float = 0.05,
min_silence_duration: float = 0.55,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we follow the same defaults as Silero? IMO we should use 0.1 now. It shouldn't have any side effect and it is closer to the truth

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love to, but I wasn't sure if you are going to merge #5788 first.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just closed mine

@longcw
Copy link
Copy Markdown
Contributor

longcw commented May 22, 2026

tested locally and it works well. could you add a comparison in the pr description?

chenghao-mou and others added 2 commits May 22, 2026 11:52
Deferred to a follow-up branch (chenghao/feat/default-vad). The auto-default
flips several call-site behaviours under default args (VAD-based turn
detection, adaptive interruption detection, hook `speaking=` payload type,
`_last_speaking_time` source, AudioTurnDetector safety branches), and we want
to land + audit those changes separately rather than couple them to the
silero-plugin inlining.

The `vad: NotGivenOr[vad.VAD | None]` type stays widened so users can opt out
explicitly today and the default-vad branch can build on top.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 13 additional findings in Devin Review.

Open in Devin Review

Comment on lines +221 to +223
from .vad import VAD

vad_instance = VAD(model="silero")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Auto-created VAD for Speechmatics STT uses drastically lower min_silence_duration default (0.1s vs 0.55s)

In _resolve_vad_for_model, the auto-created VAD for Speechmatics STT models changed from SileroVAD.load() (which had min_silence_duration=0.55) to inference.VAD(model="silero") (which defaults to min_silence_duration=0.1). This 5.5× reduction means END_OF_SPEECH events fire much sooner for Speechmatics STT users who don't provide their own VAD. While the audio EOT detector (_maybe_apply_vad_silence_override in livekit-agents/livekit/agents/voice/audio_recognition.py:669) bumps this to at least 0.25s when the audio turn detector is active, users with text-based turn detectors (e.g., MultilingualModel) will experience the full 0.1s default — significantly different from the previous 0.55s behavior.

Suggested change
from .vad import VAD
vad_instance = VAD(model="silero")
from .vad import VAD
vad_instance = VAD(model="silero", min_silence_duration=0.55)
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

…pectations

- The bundled silero VAD's native singleton is preloaded in the forkserver
  via `livekit.agents.inference._warmup`, so the per-process prewarm pattern
  that stashed `inference.VAD(model="silero")` into `proc.userdata["vad"]`
  is now redundant. Inline the construction directly in `AgentSession(...)`
  and drop the empty/single-purpose `prewarm` functions.
- Update `TestVadMinSilenceOverride` expectations: the required silence is
  `(MIN_SILENCE_DURATION_MS + 50) / 1000 = 0.25s` (was stale at 0.2s) and
  the log message check is "bumping vad min_silence_duration" (was stale).
- Wire the missing `_stt_pipeline`, `_stt_consumer_atask`,
  `_interruption_atask`, `_turn_detection_atask`, and
  `_backchannel_boundary_timer` attributes into the synthetic
  `AudioRecognition` test fixture so `aclose()` doesn't trip over
  missing internal state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants