Feat: image caption cache#8580
Conversation
There was a problem hiding this comment.
Hey - I've found 3 issues, and left some high level feedback:
- The global
ImageCaptionCachekeeps a_locksdict that is never cleaned up even when entries expire; consider pruning locks alongside_entriesor using a different structure to avoid unbounded growth over a long-running process. - In
ImageCaptionCache.get_or_create, adding a precise type annotation forcaption_factory(e.g.Callable[[], Awaitable[str]]) would make the async contract clearer and help static analysis tools catch misuse.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The global `ImageCaptionCache` keeps a `_locks` dict that is never cleaned up even when entries expire; consider pruning locks alongside `_entries` or using a different structure to avoid unbounded growth over a long-running process.
- In `ImageCaptionCache.get_or_create`, adding a precise type annotation for `caption_factory` (e.g. `Callable[[], Awaitable[str]]`) would make the async contract clearer and help static analysis tools catch misuse.
## Individual Comments
### Comment 1
<location path="astrbot/core/utils/image_caption_cache.py" line_range="72-81" />
<code_context>
+ )
+ return cached_caption
+
+ lock = self._locks.setdefault(cache_key, asyncio.Lock())
+ async with lock:
+ cached_caption = self._get(cache_key)
+ if cached_caption is not None:
+ logger.debug(
+ "Using cached image caption after lock wait. provider=%s",
+ provider_id or "<default>",
+ )
+ return cached_caption
+
+ caption = await caption_factory()
+ self._entries[cache_key] = _ImageCaptionCacheEntry(
+ caption=caption,
+ expires_at=time.monotonic() + ttl_seconds,
+ )
+ self._cleanup_expired_entries()
+ return caption
+
</code_context>
<issue_to_address>
**issue (bug_risk):** Locks dictionary can grow without bound, leading to potential memory leaks over long runtimes.
`_cleanup_expired_entries` only prunes `_entries`; `_locks` still retains a lock per cache key indefinitely. In long-lived processes with many unique keys this can cause unbounded growth. Consider removing `self._locks[cache_key]` when its entry expires, or periodically pruning locks for keys no longer present in `_entries`.
</issue_to_address>
### Comment 2
<location path="astrbot/builtin_stars/astrbot/group_chat_context.py" line_range="91-93" />
<code_context>
image_caption_prompt: str,
+ cache_ttl: int = 0,
) -> str:
if not image_caption_provider_id:
provider = self.context.get_using_provider()
+ provider_id = (
+ provider.provider_config.get("id", "")
+ if isinstance(provider, Provider)
</code_context>
<issue_to_address>
**issue (bug_risk):** Caching with an empty or non-unique provider ID can mix captions from different default providers.
When `image_caption_provider_id` is falsy, `provider_id` comes from `provider.provider_config.get("id", "")`. If this ever resolves to `""` or a non-unique value while different providers are used over time, cached captions can be incorrectly shared across providers. Consider enforcing a stable, non-empty, unique identifier (e.g., required `id` in config or a fallback like the provider class name) to ensure cache keys stay provider-specific.
</issue_to_address>
### Comment 3
<location path="astrbot/core/utils/image_caption_cache.py" line_range="16" />
<code_context>
+DEFAULT_IMAGE_CAPTION_CACHE_TTL = 600
+
+
+def resolve_image_caption_cache_ttl(config: dict | None) -> int:
+ if not isinstance(config, dict):
+ return DEFAULT_IMAGE_CAPTION_CACHE_TTL
</code_context>
<issue_to_address>
**issue (complexity):** Consider simplifying TTL handling and lock lifecycle management to reduce branching and prevent unbounded growth of per-key locks while preserving existing behavior.
You can keep all current behavior but trim some complexity and lifecycle issues with small, focused changes.
### 1. Clean up per-key locks when entries expire
Right now `_locks` can grow without bound because locks are never removed. You can piggyback on `_cleanup_expired_entries` to prune locks for expired keys:
```python
def _cleanup_expired_entries(self) -> None:
now = time.monotonic()
expired_keys = [
key for key, entry in self._entries.items() if entry.expires_at <= now
]
for key in expired_keys:
self._entries.pop(key, None)
self._locks.pop(key, None) # keep locks in sync with entries
```
You can also simplify lock lookup using `defaultdict` to avoid `setdefault` noise:
```python
from collections import defaultdict
class ImageCaptionCache:
def __init__(self) -> None:
self._entries: dict[str, _ImageCaptionCacheEntry] = {}
self._locks: defaultdict[str, asyncio.Lock] = defaultdict(asyncio.Lock)
async def get_or_create(...):
...
lock = self._locks[cache_key]
async with lock:
...
```
### 2. Simplify TTL resolution while preserving behavior
You can keep the current type semantics (including the `bool` special-case) but make the logic tighter and easier to read:
```python
def resolve_image_caption_cache_ttl(config: dict | None) -> int:
raw = (config or {}).get(
"image_caption_cache_ttl",
DEFAULT_IMAGE_CAPTION_CACHE_TTL,
)
# Preserve bool special-case
if isinstance(raw, bool):
return DEFAULT_IMAGE_CAPTION_CACHE_TTL
try:
return max(int(raw), 0)
except (TypeError, ValueError):
return DEFAULT_IMAGE_CAPTION_CACHE_TTL
```
This reduces branching and makes the accepted types clearer while keeping the existing behavior intact.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Code Review
This pull request introduces an image caption caching mechanism (ImageCaptionCache) to store and reuse vision recognition results, preventing redundant API calls when the same image is processed within a configurable TTL (image_caption_cache_ttl). This caching logic is integrated into both group chat contexts and the main agent, with configuration options and translations updated accordingly. The review feedback highlights a potential memory leak in the cache implementation where lock objects in self._locks are never cleaned up, and suggests a finally block to safely remove locks after caption generation.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
|
@sourcery-ai review |
There was a problem hiding this comment.
Hey - I've found 3 issues, and left some high level feedback:
- In
tests/test_group_chat_context.pyyou importProviderfromastrbot.core.provider, but the production code usesastrbot.api.provider.Provider; aligning the import to the actual implementation will avoid spec mismatches and future refactors breaking these tests. - In
ImageCaptionCache.get_or_create,caption_factoryis untyped and unconstrained; consider typing it as aCallable[[], Awaitable[str]](and enforcing that signature at call sites) to make misuse easier to catch and improve editor/static-analysis support.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `tests/test_group_chat_context.py` you import `Provider` from `astrbot.core.provider`, but the production code uses `astrbot.api.provider.Provider`; aligning the import to the actual implementation will avoid spec mismatches and future refactors breaking these tests.
- In `ImageCaptionCache.get_or_create`, `caption_factory` is untyped and unconstrained; consider typing it as a `Callable[[], Awaitable[str]]` (and enforcing that signature at call sites) to make misuse easier to catch and improve editor/static-analysis support.
## Individual Comments
### Comment 1
<location path="astrbot/core/utils/image_caption_cache.py" line_range="50-57" />
<code_context>
+ self._entries.clear()
+ self._locks.clear()
+
+ async def get_or_create(
+ self,
+ *,
+ provider_id: str,
+ prompt: str,
+ image_urls: list[str],
+ ttl_seconds: int,
+ caption_factory,
+ ) -> str:
+ if ttl_seconds <= 0:
</code_context>
<issue_to_address>
**suggestion (bug_risk):** The `caption_factory` parameter should be typed/validated as an async callable for clarity and safety.
Because `caption_factory` is untyped and assumed to be async returning `str`, passing a sync function will cause `await caption_factory()` to fail at runtime. Consider typing it as `Callable[[], Awaitable[str]]` (or otherwise documenting/enforcing that it must be async), and optionally adding a small runtime check (e.g. `inspect.iscoroutinefunction`) to fail fast with a clear error message.
Suggested implementation:
```python
import inspect
from collections.abc import Awaitable, Callable
class ImageCaptionCache:
```
```python
ttl_seconds: int,
caption_factory: Callable[[], Awaitable[str]],
) -> str:
if not inspect.iscoroutinefunction(caption_factory):
raise TypeError(
"caption_factory must be an async callable (defined with 'async def') "
"that returns a str"
)
if ttl_seconds <= 0:
return await caption_factory()
```
</issue_to_address>
### Comment 2
<location path="astrbot/core/utils/image_caption_cache.py" line_range="41" />
<code_context>
+ users: int = 0
+
+
+class ImageCaptionCache:
+ def __init__(self) -> None:
+ self._entries: dict[str, _ImageCaptionCacheEntry] = {}
</code_context>
<issue_to_address>
**issue (complexity):** Consider refactoring the cache to keep most logic synchronous, simplify per-key locking, and split image fingerprinting into small helpers to reduce cognitive and control-flow complexity.
You can significantly reduce cognitive and control‑flow complexity without changing behavior by tightening the async surface and simplifying lock management / helpers.
### 1. Make cache key building sync and isolate the only blocking I/O
Only the file read needs a thread; everything else is pure CPU/string work. You can keep `get_or_create` async (because `caption_factory` is async) and make `_build_cache_key` sync, calling a small async wrapper only when you actually hit the filesystem.
```python
async def get_or_create(...):
...
cache_key = await asyncio.to_thread(
self._build_cache_key,
provider_id,
prompt,
image_urls,
)
...
```
```python
def _build_cache_key(
self,
provider_id: str,
prompt: str,
image_urls: list[str],
) -> str:
image_fingerprints = [
self._fingerprint_image_sync(image_url) for image_url in image_urls
]
joined = "\n".join([provider_id, prompt, *image_fingerprints])
return hashlib.sha256(joined.encode("utf-8")).hexdigest()
```
```python
def _fingerprint_image_sync(self, image_url: str) -> str:
if image_url.startswith("base64://"):
...
if image_url.startswith("data:image"):
...
if image_url.startswith(("http://", "https://")):
return f"url:{image_url}"
local_path = self._to_local_path(image_url)
if local_path and local_path.is_file():
# only this uses blocking I/O
image_bytes = local_path.read_bytes()
return self._hash_bytes(image_bytes)
return f"ref:{image_url}"
```
If you want to keep file I/O off the event loop, push only that part to a thread:
```python
def _fingerprint_image_sync(self, image_url: str) -> str:
...
local_path = self._to_local_path(image_url)
if local_path and local_path.is_file():
return f"file:{local_path}"
return f"ref:{image_url}"
```
```python
async def _fingerprint_file(self, path: Path) -> str:
image_bytes = await asyncio.to_thread(path.read_bytes)
return self._hash_bytes(image_bytes)
```
Then use `_fingerprint_file` only when you detect a `file:` fingerprint marker. This keeps most logic sync, with a very small async boundary around reading bytes.
### 2. Simplify lock management by dropping `users` and the custom entry type
You don’t need reference counting; a simple per‑key lock is enough and easier to reason about. This keeps behavior (per‑key concurrency) intact.
```python
class ImageCaptionCache:
def __init__(self) -> None:
self._entries: dict[str, _ImageCaptionCacheEntry] = {}
self._locks: dict[str, asyncio.Lock] = {}
def _get_lock(self, cache_key: str) -> asyncio.Lock:
lock = self._locks.get(cache_key)
if lock is None:
lock = asyncio.Lock()
self._locks[cache_key] = lock
return lock
```
```python
async def get_or_create(...):
...
lock = self._get_lock(cache_key)
async with lock:
cached_caption = self._get(cache_key)
if cached_caption is not None:
...
caption = await caption_factory()
self._entries[cache_key] = _ImageCaptionCacheEntry(
caption=caption,
expires_at=time.monotonic() + ttl_seconds,
)
self._cleanup_expired_entries()
return caption
```
You can optionally prune `_locks` when cleaning `_entries`:
```python
def _cleanup_expired_entries(self) -> None:
now = time.monotonic()
expired_keys = [k for k, e in self._entries.items() if e.expires_at <= now]
for key in expired_keys:
self._entries.pop(key, None)
self._locks.pop(key, None) # prune matching locks
```
No `users` counter, no `_ImageCaptionCacheLockEntry`, and lock lifecycle is tied directly to cache keys.
### 3. Split `_fingerprint_image` into small helpers for readability
Functionally identical but much easier to scan and test in isolation:
```python
def _fingerprint_image_sync(self, image_url: str) -> str:
if image_url.startswith("base64://"):
return self._fingerprint_base64(image_url)
if image_url.startswith("data:image"):
return self._fingerprint_data_uri(image_url)
if image_url.startswith(("http://", "https://")):
return f"url:{image_url}"
local_path = self._to_local_path(image_url)
if local_path and local_path.is_file():
return self._fingerprint_file_marker(local_path)
return f"ref:{image_url}"
```
```python
def _fingerprint_base64(self, image_url: str) -> str:
raw_base64 = image_url.removeprefix("base64://")
try:
image_bytes = base64.b64decode(raw_base64)
except Exception:
return f"ref:{image_url}"
return self._hash_bytes(image_bytes)
def _fingerprint_data_uri(self, image_url: str) -> str:
try:
_, encoded = image_url.split(",", 1)
image_bytes = base64.b64decode(encoded)
except Exception:
return f"ref:{image_url}"
return self._hash_bytes(image_bytes)
```
This keeps all current variants and hashing behavior but flattens the branching into self‑documenting helpers, reducing the perceived complexity of the main cache logic.
</issue_to_address>
### Comment 3
<location path="astrbot/builtin_stars/astrbot/group_chat_context.py" line_range="274" />
<code_context>
return GROUP_HISTORY_HEADER + "\n".join(records) + GROUP_HISTORY_FOOTER
+
+
+def _resolve_provider_cache_identity(
+ provider: Provider,
+ configured_provider_id: str,
</code_context>
<issue_to_address>
**issue (complexity):** Consider simplifying the provider identity resolution and moving the caption factory into a separate helper method to make the caching logic easier to read and maintain.
You can keep the new behavior but trim complexity in two focused places:
---
### 1. Simplify `_resolve_provider_cache_identity`
You don’t really need JSON serialization + SHA256 for a reasonably unique cache key. A deterministic string based on class, type and model is much easier to follow and still avoids collisions in realistic setups.
```python
def _resolve_provider_cache_identity(
provider: Provider,
configured_provider_id: str,
) -> str:
# Respect explicit configuration
if configured_provider_id:
return configured_provider_id
# Reuse provider-configured ID if present
provider_id = provider.provider_config.get("id")
if isinstance(provider_id, str) and provider_id:
return provider_id
# Fallback: stable, human-readable identity
return ":".join(
[
provider.__class__.__module__,
provider.__class__.__qualname__,
provider.provider_config.get("type", ""),
provider.get_model() or "",
]
)
```
This keeps the “different providers/configs get different keys” property in practice (since class + type + model usually distinguish them), but removes JSON dumping, hashing and the extra payload structure.
If you really need a config-sensitive key, you can still keep it simple by adding a *small* subset of config fields instead of the full dict.
---
### 2. Remove the nested `_caption_factory` closure
The nested async function adds indirection and captures multiple variables. You can move it into a small private method and use a lightweight lambda/partial when calling the cache.
```python
async def _fetch_image_caption(
self,
provider: Provider,
prompt: str,
image_url: str,
) -> str:
response = await provider.text_chat(
prompt=prompt,
session_id=uuid.uuid4().hex,
image_urls=[image_url],
persist=False,
)
return response.completion_text
```
Then `get_image_caption` becomes:
```python
async def get_image_caption(
self,
image_url: str,
image_caption_provider_id: str,
image_caption_prompt: str,
cache_ttl: int = 0,
) -> str:
if not image_caption_provider_id:
provider = self.context.get_using_provider()
else:
provider = self.context.get_provider_by_id(image_caption_provider_id)
if not provider:
raise Exception(f"Provider `{image_caption_provider_id}` was not found.")
if not isinstance(provider, Provider):
raise Exception(
f"Provider type is invalid for image captioning: {type(provider)}."
)
provider_id = _resolve_provider_cache_identity(
provider,
configured_provider_id=image_caption_provider_id,
)
return await image_caption_cache.get_or_create(
provider_id=provider_id,
prompt=image_caption_prompt,
image_urls=[image_url],
ttl_seconds=cache_ttl,
caption_factory=lambda: self._fetch_image_caption(
provider,
image_caption_prompt,
image_url,
),
)
```
Behavior is unchanged (same cache call, same parameters, same provider usage), but the control flow is flatter and easier to scan.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|
@sourcery-ai review |
There was a problem hiding this comment.
Hey - I've found 2 issues
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location path="astrbot/core/astr_main_agent.py" line_range="863-868" />
<code_context>
req.extra_user_content_parts.append(TextPart(text=quoted_text))
+class _QuotedImageCaptionContext:
+ def __init__(self, provider: Provider) -> None:
+ self._provider = provider
+
+ def get_provider_by_id(self, provider_id: str) -> Provider:
+ return self._provider
+
</code_context>
<issue_to_address>
**suggestion (bug_risk):** `_QuotedImageCaptionContext.get_provider_by_id` ignores `provider_id`, which may be confusing for future usage.
Here `get_provider_by_id` ignores its `provider_id` argument and always returns the same provider, which breaks the usual expectation for this kind of method. Consider either checking that `provider_id` matches the wrapped provider and raising otherwise, or renaming/adjusting the method to clearly reflect that only a single provider is supported.
```suggestion
class _QuotedImageCaptionContext:
def __init__(self, provider: Provider) -> None:
self._provider = provider
def get_provider_by_id(self, provider_id: str) -> Provider:
"""
Return the wrapped provider if the requested provider_id matches it.
This mirrors the usual expectation of a `get_provider_by_id` method by
enforcing that only the wrapped provider is addressable via its id.
Raises:
ValueError: if `provider_id` does not match the wrapped provider.
"""
wrapped_id = getattr(self._provider, "id", None)
if wrapped_id is not None and provider_id != wrapped_id:
raise ValueError(
f"Requested provider_id '{provider_id}' does not match "
f"wrapped provider id '{wrapped_id}'."
)
return self._provider
```
</issue_to_address>
### Comment 2
<location path="astrbot/core/utils/image_caption_cache.py" line_range="43" />
<code_context>
+ users: int = 0
+
+
+class ImageCaptionCache:
+ def __init__(self) -> None:
+ self._entries: dict[str, _ImageCaptionCacheEntry] = {}
</code_context>
<issue_to_address>
**issue (complexity):** Consider simplifying the cache’s concurrency and factory-calling logic by using plain per-key locks and directly awaiting the caption factory to reduce state and indirection.
You can simplify this without losing any behavior by collapsing the locking model and dropping the reflection around `caption_factory`.
### 1. Simplify per-key locking (drop `users` bookkeeping)
The `users` counter and lifecycle management of `_ImageCaptionCacheLockEntry` add state and branches that don’t materially change behavior for an in-memory cache. You can keep per-key locks and just let them live for the process lifetime.
Replace the lock entry dataclass and methods with a simple `dict[str, asyncio.Lock]`:
```python
@dataclass(slots=True)
class _ImageCaptionCacheEntry:
caption: str
expires_at: float
class ImageCaptionCache:
def __init__(self) -> None:
self._entries: dict[str, _ImageCaptionCacheEntry] = {}
self._locks: dict[str, asyncio.Lock] = {}
def clear(self) -> None:
self._entries.clear()
self._locks.clear()
def _get_lock(self, cache_key: str) -> asyncio.Lock:
lock = self._locks.get(cache_key)
if lock is None:
lock = asyncio.Lock()
self._locks[cache_key] = lock
return lock
```
Then simplify `get_or_create` to:
```python
async def get_or_create(
self,
*,
provider_id: str,
prompt: str,
image_urls: list[str],
ttl_seconds: int,
caption_factory: Callable[[], Awaitable[str]],
) -> str:
if ttl_seconds <= 0:
return await caption_factory()
cache_key = await self._build_cache_key(
provider_id=provider_id,
prompt=prompt,
image_urls=image_urls,
)
cached_caption = self._get(cache_key)
if cached_caption is not None:
logger.debug("Using cached image caption. provider=%s", provider_id or "<default>")
return cached_caption
lock = self._get_lock(cache_key)
async with lock:
cached_caption = self._get(cache_key)
if cached_caption is not None:
logger.debug(
"Using cached image caption after lock wait. provider=%s",
provider_id or "<default>",
)
return cached_caption
caption = await caption_factory()
self._entries[cache_key] = _ImageCaptionCacheEntry(
caption=caption,
expires_at=time.monotonic() + ttl_seconds,
)
self._cleanup_expired_entries()
return caption
```
And `_cleanup_expired_entries` no longer needs to touch `_locks`:
```python
def _cleanup_expired_entries(self) -> None:
now = time.monotonic()
expired_keys = [
key for key, entry in self._entries.items() if entry.expires_at <= now
]
for key in expired_keys:
self._entries.pop(key, None)
# Locks remain; they are small and safe to keep for process lifetime.
```
This removes `_ImageCaptionCacheLockEntry`, `_acquire_lock_entry`, `_release_lock_entry`, and the subtle coupling between `_locks` and TTL, while keeping the “avoid thundering herd per key” property.
### 2. Remove `_invoke_caption_factory` and runtime reflection
If you already type `caption_factory: Callable[[], Awaitable[str]]`, you can rely on that contract and tests instead of checking `inspect.isawaitable` at runtime. This makes the call path shorter and easier to read.
You can delete `_invoke_caption_factory` and directly `await` the factory:
```python
# Before
if ttl_seconds <= 0:
return await self._invoke_caption_factory(caption_factory)
...
caption = await self._invoke_caption_factory(caption_factory)
# After
if ttl_seconds <= 0:
return await caption_factory()
...
caption = await caption_factory()
```
And drop this method entirely:
```python
# Remove this
async def _invoke_caption_factory(
self,
caption_factory: Callable[[], Awaitable[str]],
) -> str:
result = caption_factory()
if not inspect.isawaitable(result):
raise TypeError(
"caption_factory must be callable and return an awaitable resolving to str"
)
return await result
```
This keeps the external API as-is but reduces indirection and reflection, making the control flow and error surface more straightforward.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|
@sourcery-ai review |
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- The cache key currently includes the prompt string, which prevents reuse of captions for the same image across different prompts; if the intended behavior is to deduplicate purely on image + provider as described in the PR, consider removing
promptfrom_build_cache_key(or updating the description if prompt-sensitive caching is desired). ImageCaptionCachestores oneasyncio.Lockper cache key in_locksbut never removes them, so under many unique images this map can grow unbounded; consider cleaning up locks when entries expire or after use to avoid long-lived memory growth.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The cache key currently includes the prompt string, which prevents reuse of captions for the same image across different prompts; if the intended behavior is to deduplicate purely on image + provider as described in the PR, consider removing `prompt` from `_build_cache_key` (or updating the description if prompt-sensitive caching is desired).
- `ImageCaptionCache` stores one `asyncio.Lock` per cache key in `_locks` but never removes them, so under many unique images this map can grow unbounded; consider cleaning up locks when entries expire or after use to avoid long-lived memory growth.
## Individual Comments
### Comment 1
<location path="astrbot/core/astr_main_agent.py" line_range="866-873" />
<code_context>
+class _QuotedImageCaptionContext:
+ def __init__(self, provider: Provider) -> None:
+ self._provider = provider
+ def get_provider_by_id(self, provider_id: str) -> Provider:
+ wrapped_id = getattr(self._provider, "id", None)
+ if wrapped_id is not None and provider_id != wrapped_id:
+ raise ValueError(
+ f"Requested provider_id '{provider_id}' does not match "
+ f"wrapped provider id '{wrapped_id}'."
+ )
+ return self._provider
+
def _append_system_reminders(
</code_context>
<issue_to_address>
**issue (bug_risk):** Avoid hard failure when provider_id does not match wrapped provider id in quoted image captions.
The strict `provider_id` vs `wrapped_id` check in `_QuotedImageCaptionContext.get_provider_by_id` raises `ValueError` even when we already have the correct provider instance. In `_process_quote_message`, `provider_id` comes from `prov.provider_config.get("id", img_cap_prov_id or "")`, which may differ from `getattr(prov, "id", None)` (e.g., one unset or different naming). Because `_process_quote_message` doesn’t catch `ValueError`, this can abort quoted message handling. Consider relaxing this to only fail when both IDs are set and differ, or just log a warning and return the provider, so harmless mismatches don’t break captioning.
</issue_to_address>
### Comment 2
<location path="astrbot/core/utils/image_caption_cache.py" line_range="97-103" />
<code_context>
+ for key in expired_keys:
+ self._entries.pop(key, None)
+
+ def _get_lock(self, cache_key: str) -> asyncio.Lock:
+ lock = self._locks.get(cache_key)
+ if lock is None:
+ lock = asyncio.Lock()
+ self._locks[cache_key] = lock
+ return lock
+
+ async def _build_cache_key(
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Cache locks dict can grow without bound and should be pruned similarly to entries.
`_locks` adds a lock per cache key but never removes them, even after `_entries` are pruned. With many distinct keys, this can cause unbounded growth and a slow memory leak. Please also prune `_locks` (e.g., alongside `_cleanup_expired_entries`, or by periodically dropping locks whose keys are no longer in `_entries`, or by using a structure like `WeakValueDictionary` that ties lock lifetime to the entry).
```suggestion
def _cleanup_expired_entries(self) -> None:
now = time.monotonic()
expired_keys = [
key for key, entry in self._entries.items() if entry.expires_at <= now
]
for key in expired_keys:
# Drop both the cached entry and its associated lock to avoid unbounded growth
self._entries.pop(key, None)
self._locks.pop(key, None)
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|
@sourcery-ai review |
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- In
ImageCaptionCache._to_local_path, the Windows drive-letter detection logic (parsed_path.startswith('/') and parsed_path[2] == ':') is a bit brittle; consider usingpathlib.PureWindowsPathor a more explicit platform-aware check to avoid mis-parsing edge-casefile://URLs. - In
ImageCaptionCache.get_or_create, you only log when a cached caption is used; adding a debug log on cache misses (and possibly when entries expire) would make it easier to understand cache behavior and diagnose unexpected caption-provider traffic in production.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `ImageCaptionCache._to_local_path`, the Windows drive-letter detection logic (`parsed_path.startswith('/') and parsed_path[2] == ':'`) is a bit brittle; consider using `pathlib.PureWindowsPath` or a more explicit platform-aware check to avoid mis-parsing edge-case `file://` URLs.
- In `ImageCaptionCache.get_or_create`, you only log when a cached caption is used; adding a debug log on cache misses (and possibly when entries expire) would make it easier to understand cache behavior and diagnose unexpected caption-provider traffic in production.
## Individual Comments
### Comment 1
<location path="astrbot/core/astr_main_agent.py" line_range="827" />
<code_context>
if path and _is_generated_compressed_image_path(path, compress_path):
event.track_temporary_local_file(compress_path)
- llm_resp = await prov.text_chat(
+ caption = await _request_img_caption(
+ prov.provider_config.get("id", img_cap_prov_id or ""),
+ {
</code_context>
<issue_to_address>
**issue (complexity):** Consider extracting a provider-centric caption helper and having `_request_img_caption` and the quote path delegate to it instead of building synthetic configs and contexts.
The complexity here mainly comes from shaping data to fit `_request_img_caption` instead of letting `_request_img_caption` accept what the caller already has.
You can keep all behavior (including caching) while simplifying by:
1. Introducing a small helper that works directly with a `Provider` and explicit `prompt`/`cache_ttl`.
2. Letting the existing `_request_img_caption` delegate to that helper.
3. Removing `_QuotedImageCaptionContext` and the synthetic `cfg` dict at the quote call site.
### 1. Extract a provider‑centric helper
```python
async def _request_img_caption_with_provider(
prov: Provider,
provider_id: str,
image_urls: list[str],
prompt: str,
cache_ttl: int | None = None,
) -> str:
if cache_ttl is None:
# Fall back to provider_config or some global default if needed
cache_ttl = resolve_image_caption_cache_ttl(prov.provider_config)
async def _caption_factory() -> str:
llm_resp = await prov.text_chat(
prompt=prompt,
image_urls=image_urls,
)
return llm_resp.completion_text
return await image_caption_cache.get_or_create(
provider_id=provider_id,
prompt=prompt,
image_urls=image_urls,
ttl_seconds=cache_ttl,
caption_factory=_caption_factory,
)
```
Then keep your existing entry point but make it a thin wrapper:
```python
async def _request_img_caption(
provider_id: str,
cfg: dict,
image_urls: list[str],
plugin_context: Context,
prompt: str | None = None,
) -> str:
prov = plugin_context.get_provider_by_id(provider_id)
if prov is None:
raise ValueError(
f"Cannot get image caption because provider `{provider_id}` is not exist.",
)
if not isinstance(prov, Provider):
raise ValueError(
f"Cannot get image caption because provider `{provider_id}` is not a valid Provider, it is {type(prov)}.",
)
img_cap_prompt = prompt or cfg.get(
"image_caption_prompt",
"Please describe the image.",
)
cache_ttl = resolve_image_caption_cache_ttl(cfg)
return await _request_img_caption_with_provider(
prov=prov,
provider_id=provider_id,
image_urls=image_urls,
prompt=img_cap_prompt,
cache_ttl=cache_ttl,
)
```
This keeps the caching logic in one place and reduces `_request_img_caption` to mostly plumbing.
### 2. Simplify the quoted‑message call site
Once you have `_request_img_caption_with_provider`, the quoted‑message path no longer needs `_QuotedImageCaptionContext` or the synthetic `cfg` dict:
```python
if prov and isinstance(prov, Provider):
path = await image_seg.convert_to_file_path()
compress_path = await _compress_image_for_provider(
path,
config.provider_settings if config else None,
)
if path and _is_generated_compressed_image_path(path, compress_path):
event.track_temporary_local_file(compress_path)
caption_prompt = "Please describe the image content."
cache_ttl = resolve_image_caption_cache_ttl(
config.provider_settings if config else None
)
caption = await _request_img_caption_with_provider(
prov=prov,
provider_id=prov.provider_config.get("id", img_cap_prov_id or ""),
image_urls=[compress_path],
prompt=caption_prompt,
cache_ttl=cache_ttl,
)
if caption:
content_parts.append(
f"[Image Caption in quoted message]: {caption}"
)
```
With this:
- `_QuotedImageCaptionContext` can be deleted entirely.
- The quote path no longer has to know about the internal shape of `cfg["image_caption_prompt"]` / `cfg["image_caption_cache_ttl"]` — it just passes prompt + TTL explicitly.
- Caching stays centralized and consistent.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
This PR adds image caption result caching for repeated images and exposes the cache TTL in configuration.
Modifications / 改动点
add image caption cache utility
reuse cached vision results when the same image is received again within the configured TTL
add provider setting for
image_caption_cache_ttladd WebUI i18n labels for the new config field in
zh-CN,en-US, andru-RUadd tests covering cache behavior and main agent integration
This is NOT a breaking change. / 这不是一个破坏性变更。
Screenshots or Test Results / 运行截图或测试结果
Checklist / 检查清单
😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
/ 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。
👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
/ 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”。
🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in
requirements.txtandpyproject.toml./ 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到
requirements.txt和pyproject.toml文件相应位置。😮 My changes do not introduce malicious code.
/ 我的更改没有引入恶意代码。
Summary by Sourcery
Introduce a reusable image caption caching mechanism and integrate it into image caption flows across the main and group chat agents, with configurable cache TTL exposed in settings and the dashboard.
New Features:
Enhancements:
image_caption_cache_ttlin provider settings with a sensible default, and wire it through main agent and group chat configuration.Tests: