refactor(gemini_embedding): make adapter more robust and replace asserts with runtime checks#8544
refactor(gemini_embedding): make adapter more robust and replace asserts with runtime checks#8544Rat0323 wants to merge 4 commits into
Conversation
…rts with runtime checks
There was a problem hiding this comment.
Code Review
This pull request improves the robustness of the Gemini embedding source by using .get() for configuration retrieval, adding input validation, replacing assert statements with explicit ValueError checks, and safely closing the client during termination. The reviewer recommends removing the broad except Exception blocks in both get_embedding and get_embeddings to avoid swallowing tracebacks and masking specific exceptions like ValueError, and suggests using raise ... from e for APIError to preserve tracebacks.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| except APIError as e: | ||
| raise Exception(f"Gemini Embedding API请求失败: {e.message}") | ||
| except Exception as e: | ||
| raise Exception(f"Gemini Embedding 发生异常: {str(e)}") |
There was a problem hiding this comment.
Catching a broad Exception and wrapping it in a generic Exception has two major drawbacks: 1) It swallows the original traceback, making debugging in production extremely difficult. 2) It catches the ValueError that was just explicitly raised, converting it into a generic Exception and losing the specific exception type. It is highly recommended to remove the broad except Exception block, allowing non-API exceptions to propagate naturally, and use 'raise ... from e' for APIError to preserve the traceback.
| except APIError as e: | |
| raise Exception(f"Gemini Embedding API请求失败: {e.message}") | |
| except Exception as e: | |
| raise Exception(f"Gemini Embedding 发生异常: {str(e)}") | |
| except APIError as e: | |
| raise Exception(f"Gemini Embedding API请求失败: {e.message}") from e |
| except APIError as e: | ||
| raise Exception(f"Gemini Embedding API批量请求失败: {e.message}") | ||
| except Exception as e: | ||
| raise Exception(f"Gemini Embedding 批量请求发生异常: {str(e)}") |
There was a problem hiding this comment.
Similarly, catching a broad Exception swallows the traceback and converts the explicitly raised ValueError into a generic Exception. It is recommended to remove the broad except Exception block and use 'raise ... from e' for APIError to preserve the traceback.
| except APIError as e: | |
| raise Exception(f"Gemini Embedding API批量请求失败: {e.message}") | |
| except Exception as e: | |
| raise Exception(f"Gemini Embedding 批量请求发生异常: {str(e)}") | |
| except APIError as e: | |
| raise Exception(f"Gemini Embedding API批量请求失败: {e.message}") from e |
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- Using
provider_config.get("embedding_api_key", "")and"embedding_api_base", ""will silently accept missing critical config and fail later; consider validating these keys explicitly and raising a clear error when they are absent. - The broad
except Exception as eblocks that wrap and re-raise a newExceptionlose the original traceback context; either narrow the exception types further or useraise ... from eto preserve debugging information. - In
get_embeddings, empty input returns[]whileget_embeddingraises aValueErrorfor empty/blank text; consider aligning the behavior and optionally validating against blank strings in the list to fail fast on invalid items.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Using `provider_config.get("embedding_api_key", "")` and `"embedding_api_base", ""` will silently accept missing critical config and fail later; consider validating these keys explicitly and raising a clear error when they are absent.
- The broad `except Exception as e` blocks that wrap and re-raise a new `Exception` lose the original traceback context; either narrow the exception types further or use `raise ... from e` to preserve debugging information.
- In `get_embeddings`, empty input returns `[]` while `get_embedding` raises a `ValueError` for empty/blank text; consider aligning the behavior and optionally validating against blank strings in the list to fail fast on invalid items.
## Individual Comments
### Comment 1
<location path="astrbot/core/provider/sources/gemini_embedding_source.py" line_range="24-25" />
<code_context>
- api_key: str = provider_config["embedding_api_key"]
- api_base: str = provider_config["embedding_api_base"]
+ # 使用 .get() 避免缺少配置时引发 KeyError 崩溃
+ api_key: str = provider_config.get("embedding_api_key", "")
+ api_base: str = provider_config.get("embedding_api_base", "")
timeout: int = int(provider_config.get("timeout", 20))
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Silently defaulting missing API config to empty strings can hide misconfiguration and lead to harder-to-debug failures.
Using `get(..., "")` avoids `KeyError`, but it also hides missing `embedding_api_key` / `embedding_api_base` and defers the failure to a less obvious point. Instead, either validate these fields and raise a clear configuration error when missing, or log a warning/error if they are empty so misconfigurations are immediately visible.
Suggested implementation:
```python
# 显式校验必需的配置,避免静默地使用空字符串导致后续难以排查的问题
api_key = provider_config.get("embedding_api_key")
api_base = provider_config.get("embedding_api_base")
if not api_key or not api_base:
raise ValueError(
"Gemini embedding provider misconfigured: "
"'embedding_api_key' and 'embedding_api_base' are required and must be non-empty."
)
timeout: int = int(provider_config.get("timeout", 20))
```
1. If the project defines a custom configuration/validation exception (for example `ConfigurationError` or similar), replace the `ValueError` with that project-specific error type to be consistent with the rest of the codebase.
2. If there is a standardized logging mechanism for configuration issues, you may also want to log this misconfiguration before raising, e.g. using a module-level `logger`.
</issue_to_address>
### Comment 2
<location path="astrbot/core/provider/sources/gemini_embedding_source.py" line_range="67-69" />
<code_context>
+ raise ValueError("API 响应异常:未返回有效的 embedding 数据")
+
return result.embeddings[0].values
except APIError as e:
raise Exception(f"Gemini Embedding API请求失败: {e.message}")
+ except Exception as e:
+ raise Exception(f"Gemini Embedding 发生异常: {str(e)}")
</code_context>
<issue_to_address>
**suggestion:** Wrapping all exceptions in generic `Exception` without `from e` loses useful traceback and error type information.
In both this method and the batch variant, the generic `except Exception as e:` block raises a new `Exception` without chaining, which drops the original traceback and type. Either use `raise Exception("...") from e` or allow non-`APIError` exceptions to propagate. Also, `str(e)` is unnecessary in an f-string; just use `{e}`.
Suggested implementation:
```python
except Exception as e:
raise Exception(f"Gemini Embedding 发生异常: {e}") from e
```
You should make the same style of change in the batch embedding method:
1. Locate the `except Exception as e:` block in the batch variant.
2. Replace any `raise Exception("...: {str(e)}")` (or similar) with `raise Exception("...: {e}") from e` to maintain exception chaining and avoid redundant `str(e)`.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| api_key: str = provider_config.get("embedding_api_key", "") | ||
| api_base: str = provider_config.get("embedding_api_base", "") |
There was a problem hiding this comment.
suggestion (bug_risk): Silently defaulting missing API config to empty strings can hide misconfiguration and lead to harder-to-debug failures.
Using get(..., "") avoids KeyError, but it also hides missing embedding_api_key / embedding_api_base and defers the failure to a less obvious point. Instead, either validate these fields and raise a clear configuration error when missing, or log a warning/error if they are empty so misconfigurations are immediately visible.
Suggested implementation:
# 显式校验必需的配置,避免静默地使用空字符串导致后续难以排查的问题
api_key = provider_config.get("embedding_api_key")
api_base = provider_config.get("embedding_api_base")
if not api_key or not api_base:
raise ValueError(
"Gemini embedding provider misconfigured: "
"'embedding_api_key' and 'embedding_api_base' are required and must be non-empty."
)
timeout: int = int(provider_config.get("timeout", 20))- If the project defines a custom configuration/validation exception (for example
ConfigurationErroror similar), replace theValueErrorwith that project-specific error type to be consistent with the rest of the codebase. - If there is a standardized logging mechanism for configuration issues, you may also want to log this misconfiguration before raising, e.g. using a module-level
logger.
…rification and exception chaining
…exception catching
This PR refactors the Gemini embedding adapter to improve its robustness:
assertstatements (which are stripped in production runtimes under Python-Ooptimization) with explicit condition checks andValueErrorexceptions.get_embeddingsandget_embedding..get()dictionary lookups in initialization.Summary by Sourcery
Refine the Gemini embedding adapter to handle invalid input and API responses more defensively and prevent crashes in production.
Bug Fixes:
Enhancements: