feat(provider): warn when single request input context exceeds threshold#8558
feat(provider): warn when single request input context exceeds threshold#8558Reisenbug wants to merge 3 commits into
Conversation
Add a configurable warning when a single LLM request carries an unusually large input context (default >48000 tokens), helping users notice runaway context growth before it silently burns tokens. - context_bloat_warn_enable (bool, default true) - context_bloat_warn_threshold (int, default 48000) - per-model throttling (once per 5min) to avoid log spam - pure addition, no change to existing behavior
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- The
_context_bloat_last_warndict is a mutable class attribute shared across all instances; if this is intended for per-instance tracking it should be an instance attribute, and if it is intentionally shared you may want to consider thread-safety (e.g., locking) and explicitly document the cross-instance throttling behavior. - For the throttle interval check, consider using
time.monotonic()instead oftime.time()to avoid issues if the system clock changes.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `_context_bloat_last_warn` dict is a mutable class attribute shared across all instances; if this is intended for per-instance tracking it should be an instance attribute, and if it is intentionally shared you may want to consider thread-safety (e.g., locking) and explicitly document the cross-instance throttling behavior.
- For the throttle interval check, consider using `time.monotonic()` instead of `time.time()` to avoid issues if the system clock changes.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Code Review
This pull request introduces a context bloat warning feature to alert users when a single request carries an unusually large input context, helping to prevent runaway token usage. It adds default configuration settings and schema metadata, and implements warning logic in the OpenAI provider source. Feedback on the changes suggests initializing the warning state dictionary at the instance level to avoid test isolation issues, moving the warning trigger out of _extract_usage to prevent duplicate warnings and adhere to the single responsibility principle, and updating the localization files for the new configuration settings.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| cached = cached or 0 | ||
| prompt_tokens = prompt_tokens or 0 | ||
| completion_tokens = completion_tokens or 0 | ||
| self._maybe_warn_context_bloat(prompt_tokens) | ||
| return TokenUsage( |
There was a problem hiding this comment.
在 _extract_usage 内部调用 _maybe_warn_context_bloat 违反了单一职责原则(SRP)。_extract_usage 是一个纯粹用于解析/提取 token 使用情况数据的工具函数。在其中引入副作用(如记录限流告警)可能会导致非预期的行为(例如在测试、干跑或处理流式传输的中间 chunk 时触发告警)。
此外,在流式请求中,_extract_usage 会被调用两次:一次是在接收到 usage chunk 时,另一次是在流结束解析最终重构的 completion 时。这会导致重复的检查。
相反,我们应该在最终解析 completion 时仅触发一次告警。我们可以在 _parse_openai_completion 中构造最终 llm_response 的地方进行调用:
if completion.usage:
llm_response.usage = self._extract_usage(completion.usage)
self._maybe_warn_context_bloat(llm_response.usage.input_other + llm_response.usage.input_cached)由于 _parse_openai_completion 不在修改的 diff 块中,我们无法直接为其提供代码建议,但我们应该在此处将该调用从 _extract_usage 中移除。
cached = cached or 0
prompt_tokens = prompt_tokens or 0
completion_tokens = completion_tokens or 0
return TokenUsage(| "provider_settings.context_bloat_warn_enable": { | ||
| "description": "启用上下文膨胀告警", | ||
| "type": "bool", | ||
| "hint": "当单次请求的输入上下文 token 超过阈值时,输出 warning 日志,提醒可能存在上下文异常膨胀(如历史无限累积导致 token 开销激增)。", | ||
| }, | ||
| "provider_settings.context_bloat_warn_threshold": { | ||
| "description": "上下文膨胀告警阈值", | ||
| "type": "int", | ||
| "hint": "单次请求输入上下文超过此 token 数时触发告警。默认 48000。", | ||
| "condition": { | ||
| "provider_settings.context_bloat_warn_enable": True, | ||
| }, | ||
| }, |
There was a problem hiding this comment.
如文件中的注释所述,自 v4.7.0 起,配置元数据字段(name、description、hint 等)已实现国际化。
当添加新的配置项(如 context_bloat_warn_enable 和 context_bloat_warn_threshold)时,请确保同步更新对应的翻译文件:
dashboard/src/i18n/locales/en-US/features/config-metadata.jsondashboard/src/i18n/locales/zh-CN/features/config-metadata.json
这可以确保新设置在 WebUI 中能够正确本地化显示。
为单次 LLM 请求的输入上下文添加可配置的膨胀告警。
上下文异常膨胀沉默失效。本 PR 提供一个早期可见性信号。
Closes #8556
Modifications / 改动点
astrbot/core/provider/sources/openai_source.py:在 _extract_usage 中新增 _maybe_warn_context_bloat,当单次请求 prompt_tokens 超过阈值时输出 warning,按模型节流(每 5分钟最多一次)避免刷屏。
astrbot/core/config/default.py:新增两个配置项及 schema:
context_bloat_warn_enable (bool, 默认 true):告警开关
context_bloat_warn_threshold (int, 默认 48000):触发阈值。我也不是很清楚多少算大...
This is NOT a breaking change. / 这不是一个破坏性变更。
Screenshots or Test Results / 运行截图或测试结果
触发告警时的日志输出示例:
单次请求输入上下文达 111351 tokens(模型 deepseek-chat,阈值 48000)。如非预期,请检查会话历史长度或max_context_length / max_context_tokens 配置,以免持续产生过高的token 开销。可在配置中关闭此提醒。
Checklist / 检查清单
😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
/ 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。
👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
/ 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”。
🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in
requirements.txtandpyproject.toml./ 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到
requirements.txt和pyproject.toml文件相应位置。😮 My changes do not introduce malicious code.
/ 我的更改没有引入恶意代码。
Summary by Sourcery
Add configurable warnings for unusually large input contexts in single LLM requests to improve early visibility into runaway token usage.
New Features:
Enhancements: