Skip to content

fix(bidi): add configurable event_queue_size to prevent choppy audio#2260

Draft
madhavi-joshi-nutrien wants to merge 1 commit intostrands-agents:mainfrom
madhavi-joshi-nutrien:fix/bidi-event-queue-choppy-audio
Draft

fix(bidi): add configurable event_queue_size to prevent choppy audio#2260
madhavi-joshi-nutrien wants to merge 1 commit intostrands-agents:mainfrom
madhavi-joshi-nutrien:fix/bidi-event-queue-choppy-audio

Conversation

@madhavi-joshi-nutrien
Copy link
Copy Markdown

Motivation
When using BidiAgent with BidiGeminiLiveModel, audio playback is choppy and bursty. The internal event queue between the model receive loop and the output handler is hardcoded to maxsize=1. This causes the model receive loop to block whenever the output handler performs any async I/O (e.g., websocket.send_json()), resulting in audio chunks piling up on the model SDK side and arriving in bursts.

This is particularly noticeable with Gemini Live, which sends many small audio chunks rapidly (~50 chunks/sec), unlike Nova Sonic which sends fewer, larger chunks with built-in TTS buffering.

Public API Changes
BidiAgent.init() accepts a new event_queue_size parameter:

Default behavior preserved (maxsize=1)

agent = BidiAgent(model=model, tools=[...])

Opt in to larger buffer for smooth audio with fast-delivering models

agent = BidiAgent(
model=BidiGeminiLiveModel(...),
tools=[...],
event_queue_size=32, # ~640ms buffer at typical Gemini chunk rates
)
The parameter controls the asyncio.Queue maxsize between the model receive loop and the output handler. Higher values absorb I/O latency spikes without stalling the model loop.

Use Cases
Gemini Live audio streaming: Prevents choppy playback when the output handler has non-trivial async I/O latency (WebSocket, WebRTC)
Custom output handlers: Any output handler doing network I/O benefits from decoupling via a larger buffer
Tuning backpressure: Users can balance memory usage vs. smoothness for their specific model and handler combination

The internal event queue between the model receive loop and the output
handler was hardcoded to maxsize=1. This causes the model receive loop
to block whenever the output handler performs any async I/O (e.g.,
websocket.send_json()), resulting in audio chunks piling up and
delivering in bursts — perceived as choppy audio.

This is particularly noticeable with Gemini Live which sends many small
audio chunks rapidly (~50/sec), unlike Nova Sonic which sends fewer,
larger chunks.

Add a configurable event_queue_size parameter to BidiAgent (default: 1
to preserve existing behavior). Users experiencing choppy audio with
fast-delivering models can increase this value to provide buffering
between the model receive loop and the output handler.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant