fix(kb): prevent retrieval crash on empty query and improve metadata fault tolerance#8568
fix(kb): prevent retrieval crash on empty query and improve metadata fault tolerance#8568Rat0323 wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- If
queryis always expected to be astrby the public contract ofretrieve, consider either tightening the type hints/validation at the API boundary or dropping theisinstance(query, str)check here to keep this guard focused on emptiness rather than type coercion. - You may want to add a lightweight debug log when returning early on an empty/whitespace-only query so that callers can more easily trace why no retrieval was performed without having to infer it from the absence of results.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- If `query` is always expected to be a `str` by the public contract of `retrieve`, consider either tightening the type hints/validation at the API boundary or dropping the `isinstance(query, str)` check here to keep this guard focused on emptiness rather than type coercion.
- You may want to add a lightweight debug log when returning early on an empty/whitespace-only query so that callers can more easily trace why no retrieval was performed without having to infer it from the absence of results.
## Individual Comments
### Comment 1
<location path="astrbot/core/knowledge_base/retrieval/manager.py" line_range="90" />
<code_context>
List[RetrievalResult]: 检索结果列表
"""
+ if not query or not isinstance(query, str) or not query.strip():
+ return []
+
</code_context>
<issue_to_address>
**issue (bug_risk):** Avoid changing non-string query handling silently to an empty result; consider raising or logging instead.
This guard changes behavior from a clear downstream error on non-string `query` to a silent empty list, which can hide call-site bugs (e.g., passing an int or dict). Instead, either raise a `TypeError`/`ValueError` for non-string input, or log a warning when this branch is taken so misuse remains visible while still protecting the rest of the logic.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| List[RetrievalResult]: 检索结果列表 | ||
|
|
||
| """ | ||
| if not query or not isinstance(query, str) or not query.strip(): |
There was a problem hiding this comment.
issue (bug_risk): Avoid changing non-string query handling silently to an empty result; consider raising or logging instead.
This guard changes behavior from a clear downstream error on non-string query to a silent empty list, which can hide call-site bugs (e.g., passing an int or dict). Instead, either raise a TypeError/ValueError for non-string input, or log a warning when this branch is taken so misuse remains visible while still protecting the rest of the logic.
There was a problem hiding this comment.
Code Review
This pull request introduces a validation check in the retrieve method of the retrieval manager to handle invalid, empty, or non-string queries by returning an empty list. The reviewer suggested simplifying the conditional expression to improve readability and efficiency.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| if not query or not isinstance(query, str) or not query.strip(): | ||
| return [] |
There was a problem hiding this comment.
The condition not query or not isinstance(query, str) or not query.strip() can be simplified. Since isinstance(query, str) evaluates to False for None or any other non-string values, checking not isinstance(query, str) first naturally covers the not query check for non-string types. For string types, not query.strip() covers empty or whitespace-only strings. Therefore, we can simplify this to not isinstance(query, str) or not query.strip().
| if not query or not isinstance(query, str) or not query.strip(): | |
| return [] | |
| if not isinstance(query, str) or not query.strip(): | |
| return [] |
4f7f1e1 to
567313a
Compare
567313a to
0ae42b1
Compare
0ae42b1 to
df3fb7c
Compare
This PR addresses a retrieval crash when querying empty/whitespace messages and resolves several documentation and maintenance issues in
RetrievalManager.Key Changes
ValueError: 输入文本不能为空crashes (e.g., in Gemini embedding provider) by returning[]immediately ifqueryis empty, non-string, or only contains whitespace.logger.warningwarning log if document metadata is missing in the database instead of throwing KeyErrors, and skipped processing the corresponding chunk.getattrto safely accesschunk_idandchunk_indexonFusedResultto prevent crashes when interacting with different vector database implementations.vec_db_factoryfrom__init__docstring, and documented missing parameterskb_id_helper_mapandtop_k_fusioninretrievedocstring.