Skip to content

feat(provider): warn when single request input context exceeds threshold#8558

Open
Reisenbug wants to merge 3 commits into
AstrBotDevs:masterfrom
Reisenbug:feat/context-bloat-warning
Open

feat(provider): warn when single request input context exceeds threshold#8558
Reisenbug wants to merge 3 commits into
AstrBotDevs:masterfrom
Reisenbug:feat/context-bloat-warning

Conversation

@Reisenbug
Copy link
Copy Markdown
Contributor

@Reisenbug Reisenbug commented Jun 3, 2026

为单次 LLM 请求的输入上下文添加可配置的膨胀告警。
上下文异常膨胀沉默失效。本 PR 提供一个早期可见性信号。
Closes #8556

Modifications / 改动点

  • astrbot/core/provider/sources/openai_source.py:在 _extract_usage 中新增 _maybe_warn_context_bloat,当单次请求 prompt_tokens 超过阈值时输出 warning,按模型节流(每 5分钟最多一次)避免刷屏。

  • astrbot/core/config/default.py:新增两个配置项及 schema:

  • context_bloat_warn_enable (bool, 默认 true):告警开关

  • context_bloat_warn_threshold (int, 默认 48000):触发阈值。我也不是很清楚多少算大...

  • This is NOT a breaking change. / 这不是一个破坏性变更。

Screenshots or Test Results / 运行截图或测试结果

触发告警时的日志输出示例:
单次请求输入上下文达 111351 tokens(模型 deepseek-chat,阈值 48000)。如非预期,请检查会话历史长度或max_context_length / max_context_tokens 配置,以免持续产生过高的token 开销。可在配置中关闭此提醒。


Checklist / 检查清单

  • 😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
    / 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。

  • 👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
    / 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”

  • 🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in requirements.txt and pyproject.toml.
    / 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到 requirements.txtpyproject.toml 文件相应位置。

  • 😮 My changes do not introduce malicious code.
    / 我的更改没有引入恶意代码。

Summary by Sourcery

Add configurable warnings for unusually large input contexts in single LLM requests to improve early visibility into runaway token usage.

New Features:

  • Introduce provider-level configuration to enable or disable context bloat warnings and set a token threshold for triggering them.

Enhancements:

  • Log rate-limited warnings when a request's prompt tokens exceed the configured threshold, keyed by model to avoid log spam.
  • Extend the default provider configuration schema with new options for context bloat warnings and their threshold.

Add a configurable warning when a single LLM request carries an unusually
large input context (default >48000 tokens), helping users notice runaway
context growth before it silently burns tokens.

- context_bloat_warn_enable (bool, default true)
- context_bloat_warn_threshold (int, default 48000)
- per-model throttling (once per 5min) to avoid log spam
- pure addition, no change to existing behavior
@dosubot dosubot Bot added size:M This PR changes 30-99 lines, ignoring generated files. area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. labels Jun 3, 2026
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • The _context_bloat_last_warn dict is a mutable class attribute shared across all instances; if this is intended for per-instance tracking it should be an instance attribute, and if it is intentionally shared you may want to consider thread-safety (e.g., locking) and explicitly document the cross-instance throttling behavior.
  • For the throttle interval check, consider using time.monotonic() instead of time.time() to avoid issues if the system clock changes.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `_context_bloat_last_warn` dict is a mutable class attribute shared across all instances; if this is intended for per-instance tracking it should be an instance attribute, and if it is intentionally shared you may want to consider thread-safety (e.g., locking) and explicitly document the cross-instance throttling behavior.
- For the throttle interval check, consider using `time.monotonic()` instead of `time.time()` to avoid issues if the system clock changes.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a context bloat warning feature to alert users when a single request carries an unusually large input context, helping to prevent runaway token usage. It adds default configuration settings and schema metadata, and implements warning logic in the OpenAI provider source. Feedback on the changes suggests initializing the warning state dictionary at the instance level to avoid test isolation issues, moving the warning trigger out of _extract_usage to prevent duplicate warnings and adhere to the single responsibility principle, and updating the localization files for the new configuration settings.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread astrbot/core/provider/sources/openai_source.py Outdated
Comment on lines 824 to 828
cached = cached or 0
prompt_tokens = prompt_tokens or 0
completion_tokens = completion_tokens or 0
self._maybe_warn_context_bloat(prompt_tokens)
return TokenUsage(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

_extract_usage 内部调用 _maybe_warn_context_bloat 违反了单一职责原则(SRP)。_extract_usage 是一个纯粹用于解析/提取 token 使用情况数据的工具函数。在其中引入副作用(如记录限流告警)可能会导致非预期的行为(例如在测试、干跑或处理流式传输的中间 chunk 时触发告警)。

此外,在流式请求中,_extract_usage 会被调用两次:一次是在接收到 usage chunk 时,另一次是在流结束解析最终重构的 completion 时。这会导致重复的检查。

相反,我们应该在最终解析 completion 时仅触发一次告警。我们可以在 _parse_openai_completion 中构造最终 llm_response 的地方进行调用:

        if completion.usage:
            llm_response.usage = self._extract_usage(completion.usage)
            self._maybe_warn_context_bloat(llm_response.usage.input_other + llm_response.usage.input_cached)

由于 _parse_openai_completion 不在修改的 diff 块中,我们无法直接为其提供代码建议,但我们应该在此处将该调用从 _extract_usage 中移除。

        cached = cached or 0
        prompt_tokens = prompt_tokens or 0
        completion_tokens = completion_tokens or 0
        return TokenUsage(

Comment on lines +3614 to +3626
"provider_settings.context_bloat_warn_enable": {
"description": "启用上下文膨胀告警",
"type": "bool",
"hint": "当单次请求的输入上下文 token 超过阈值时,输出 warning 日志,提醒可能存在上下文异常膨胀(如历史无限累积导致 token 开销激增)。",
},
"provider_settings.context_bloat_warn_threshold": {
"description": "上下文膨胀告警阈值",
"type": "int",
"hint": "单次请求输入上下文超过此 token 数时触发告警。默认 48000。",
"condition": {
"provider_settings.context_bloat_warn_enable": True,
},
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

如文件中的注释所述,自 v4.7.0 起,配置元数据字段(namedescriptionhint 等)已实现国际化。

当添加新的配置项(如 context_bloat_warn_enablecontext_bloat_warn_threshold)时,请确保同步更新对应的翻译文件:

  • dashboard/src/i18n/locales/en-US/features/config-metadata.json
  • dashboard/src/i18n/locales/zh-CN/features/config-metadata.json

这可以确保新设置在 WebUI 中能够正确本地化显示。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] max_context_tokens=0 默认填充模型上下文窗口,对计费类 API 不友好,建议增加防御性告警

1 participant