Skip to content

fix(kb): prevent retrieval crash on empty query and improve metadata fault tolerance#8568

Open
Rat0323 wants to merge 1 commit into
AstrBotDevs:masterfrom
Rat0323:fix-kb-empty-query-value-error
Open

fix(kb): prevent retrieval crash on empty query and improve metadata fault tolerance#8568
Rat0323 wants to merge 1 commit into
AstrBotDevs:masterfrom
Rat0323:fix-kb-empty-query-value-error

Conversation

@Rat0323
Copy link
Copy Markdown
Contributor

@Rat0323 Rat0323 commented Jun 3, 2026

This PR addresses a retrieval crash when querying empty/whitespace messages and resolves several documentation and maintenance issues in RetrievalManager.

Key Changes

  1. Empty Query Guard: Prevents ValueError: 输入文本不能为空 crashes (e.g., in Gemini embedding provider) by returning [] immediately if query is empty, non-string, or only contains whitespace.
  2. Metadata Fault Tolerance: Added a logger.warning warning log if document metadata is missing in the database instead of throwing KeyErrors, and skipped processing the corresponding chunk.
  3. Attribute Protection: Used getattr to safely access chunk_id and chunk_index on FusedResult to prevent crashes when interacting with different vector database implementations.
  4. Docstring Mismatches: Removed non-existent parameter vec_db_factory from __init__ docstring, and documented missing parameters kb_id_helper_map and top_k_fusion in retrieve docstring.

@dosubot dosubot Bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Jun 3, 2026
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • If query is always expected to be a str by the public contract of retrieve, consider either tightening the type hints/validation at the API boundary or dropping the isinstance(query, str) check here to keep this guard focused on emptiness rather than type coercion.
  • You may want to add a lightweight debug log when returning early on an empty/whitespace-only query so that callers can more easily trace why no retrieval was performed without having to infer it from the absence of results.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- If `query` is always expected to be a `str` by the public contract of `retrieve`, consider either tightening the type hints/validation at the API boundary or dropping the `isinstance(query, str)` check here to keep this guard focused on emptiness rather than type coercion.
- You may want to add a lightweight debug log when returning early on an empty/whitespace-only query so that callers can more easily trace why no retrieval was performed without having to infer it from the absence of results.

## Individual Comments

### Comment 1
<location path="astrbot/core/knowledge_base/retrieval/manager.py" line_range="90" />
<code_context>
             List[RetrievalResult]: 检索结果列表

         """
+        if not query or not isinstance(query, str) or not query.strip():
+            return []
+
</code_context>
<issue_to_address>
**issue (bug_risk):** Avoid changing non-string query handling silently to an empty result; consider raising or logging instead.

This guard changes behavior from a clear downstream error on non-string `query` to a silent empty list, which can hide call-site bugs (e.g., passing an int or dict). Instead, either raise a `TypeError`/`ValueError` for non-string input, or log a warning when this branch is taken so misuse remains visible while still protecting the rest of the logic.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

List[RetrievalResult]: 检索结果列表

"""
if not query or not isinstance(query, str) or not query.strip():
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Avoid changing non-string query handling silently to an empty result; consider raising or logging instead.

This guard changes behavior from a clear downstream error on non-string query to a silent empty list, which can hide call-site bugs (e.g., passing an int or dict). Instead, either raise a TypeError/ValueError for non-string input, or log a warning when this branch is taken so misuse remains visible while still protecting the rest of the logic.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a validation check in the retrieve method of the retrieval manager to handle invalid, empty, or non-string queries by returning an empty list. The reviewer suggested simplifying the conditional expression to improve readability and efficiency.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +90 to +91
if not query or not isinstance(query, str) or not query.strip():
return []
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The condition not query or not isinstance(query, str) or not query.strip() can be simplified. Since isinstance(query, str) evaluates to False for None or any other non-string values, checking not isinstance(query, str) first naturally covers the not query check for non-string types. For string types, not query.strip() covers empty or whitespace-only strings. Therefore, we can simplify this to not isinstance(query, str) or not query.strip().

Suggested change
if not query or not isinstance(query, str) or not query.strip():
return []
if not isinstance(query, str) or not query.strip():
return []

@Rat0323 Rat0323 force-pushed the fix-kb-empty-query-value-error branch 2 times, most recently from 4f7f1e1 to 567313a Compare June 3, 2026 21:26
@dosubot dosubot Bot added size:S This PR changes 10-29 lines, ignoring generated files. and removed size:XS This PR changes 0-9 lines, ignoring generated files. labels Jun 3, 2026
@Rat0323 Rat0323 force-pushed the fix-kb-empty-query-value-error branch from 567313a to 0ae42b1 Compare June 3, 2026 21:50
@dosubot dosubot Bot added size:M This PR changes 30-99 lines, ignoring generated files. and removed size:S This PR changes 10-29 lines, ignoring generated files. labels Jun 3, 2026
@Rat0323 Rat0323 force-pushed the fix-kb-empty-query-value-error branch from 0ae42b1 to df3fb7c Compare June 3, 2026 21:56
@Rat0323 Rat0323 changed the title fix(kb): skip knowledge base retrieval if query is empty or whitespace fix(kb): prevent retrieval crash on empty query and improve metadata fault tolerance Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant