Skip to content

feat(voice): add AgentSession.claim_user_turn#5806

Open
longcw wants to merge 2 commits into
mainfrom
longc/claim-user-turn
Open

feat(voice): add AgentSession.claim_user_turn#5806
longcw wants to merge 2 commits into
mainfrom
longc/claim-user-turn

Conversation

@longcw
Copy link
Copy Markdown
Contributor

@longcw longcw commented May 22, 2026

Summary

Adds AgentSession.claim_user_turn(), a public async context manager that declares a programmatic user-driven turn. While active:

  • wait_for_inactive is held open via a session-level counter (_user_turn_claims) and event (_user_turn_released).
  • user_state is pinned to "speaking" — non-"speaking" updates from the audio path are dropped in _update_user_state.

On release, user_state is re-derived from the audio path's _user_silence_event: "speaking" if voice still considers the user to be speaking, else "listening". Reentrant; held at the session level so it survives agent handoff.

Why

The default text_input_cb (and any custom callback that does interrupt + generate_reply) had a race: a deferred async-tool reply waiting on wait_for_inactive could resolve in the gap between await sess.interrupt() and sess.generate_reply(...), firing concurrently with the new turn and producing a duplicate response. The default callback now wraps the pair in claim_user_turn to close the window.

Behavior notes

  • The hold check sits at the end of AgentActivity._wait_for_inactive's loop body, so a claim acquired during the body's awaits (e.g. while waiting for ongoing speech) is still honored.
  • Voice transitions during the claim are not lost — they're recoverable from _user_silence_event when the claim ends.
  • Pre-claim "away" status doesn't carry through: post-claim state goes to "listening", and the away timer restarts naturally.

@chenghao-mou chenghao-mou requested a review from a team May 22, 2026 05:28
A public async context manager for declaring a programmatic user-driven
turn. While active, `wait_for_inactive` is held open and `user_state` is
pinned to "speaking". On release, `user_state` is re-derived from the
audio path's `_user_silence_event` ("speaking" if voice still considers
the user to be speaking, else "listening").

Fixes a race in the default `text_input_cb` (and any custom callback
doing `interrupt + generate_reply`): a deferred async-tool reply could
resolve `wait_for_inactive` in the gap between `interrupt()` and
`generate_reply()` and fire concurrently with the new turn, producing a
duplicate response. The default callback now uses `claim_user_turn`.

Reentrant via counter; held at the session level so it survives agent
handoff. Voice transitions during the claim are suppressed and
recoverable from `_user_silence_event` on release.
@longcw longcw force-pushed the longc/claim-user-turn branch from 13420b4 to 2dd04ad Compare May 22, 2026 05:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant