Skip to content

Feat: image caption cache#8580

Open
FloranceYeh wants to merge 7 commits into
AstrBotDevs:masterfrom
FloranceYeh:feat-image-caption-cache
Open

Feat: image caption cache#8580
FloranceYeh wants to merge 7 commits into
AstrBotDevs:masterfrom
FloranceYeh:feat-image-caption-cache

Conversation

@FloranceYeh
Copy link
Copy Markdown

@FloranceYeh FloranceYeh commented Jun 4, 2026

This PR adds image caption result caching for repeated images and exposes the cache TTL in configuration.

Modifications / 改动点

  • add image caption cache utility

  • reuse cached vision results when the same image is received again within the configured TTL

  • add provider setting for image_caption_cache_ttl

  • add WebUI i18n labels for the new config field in zh-CN, en-US, and ru-RU

  • add tests covering cache behavior and main agent integration

  • This is NOT a breaking change. / 这不是一个破坏性变更。

Screenshots or Test Results / 运行截图或测试结果

PixPin_2026-06-04_23-06-03 PixPin_2026-06-04_23-05-19

Checklist / 检查清单

  • 😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
    / 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。

  • 👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
    / 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”

  • 🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in requirements.txt and pyproject.toml.
    / 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到 requirements.txtpyproject.toml 文件相应位置。

  • 😮 My changes do not introduce malicious code.
    / 我的更改没有引入恶意代码。

Summary by Sourcery

Introduce a reusable image caption caching mechanism and integrate it into image caption flows across the main and group chat agents, with configurable cache TTL exposed in settings and the dashboard.

New Features:

  • Add an in-memory, provider-aware image caption cache with configurable TTL and image fingerprinting for local, remote, and base64 image references.
  • Cache and reuse image captions for repeated images in both group chat image handling and quoted-message image captioning flows, including a stable provider identity resolver.

Enhancements:

  • Expose image_caption_cache_ttl in provider settings with a sensible default, and wire it through main agent and group chat configuration.
  • Improve provider lookup and error messaging for image captioning, and ensure quoted image captioning tolerates provider ID mismatches via a wrapper context.
  • Add a new MDI icon to the dashboard icon subset and corresponding i18n config metadata entries for the new image caption cache TTL setting.

Tests:

  • Add unit tests for the image caption cache behavior, including per-key locking, image fingerprinting, and lambda-based caption factories.
  • Add tests covering group chat and main agent integration with the image caption cache, including provider identity resolution and quoted image caption flows.

@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. area:webui The bug / feature is about webui(dashboard) of astrbot. labels Jun 4, 2026
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 3 issues, and left some high level feedback:

  • The global ImageCaptionCache keeps a _locks dict that is never cleaned up even when entries expire; consider pruning locks alongside _entries or using a different structure to avoid unbounded growth over a long-running process.
  • In ImageCaptionCache.get_or_create, adding a precise type annotation for caption_factory (e.g. Callable[[], Awaitable[str]]) would make the async contract clearer and help static analysis tools catch misuse.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The global `ImageCaptionCache` keeps a `_locks` dict that is never cleaned up even when entries expire; consider pruning locks alongside `_entries` or using a different structure to avoid unbounded growth over a long-running process.
- In `ImageCaptionCache.get_or_create`, adding a precise type annotation for `caption_factory` (e.g. `Callable[[], Awaitable[str]]`) would make the async contract clearer and help static analysis tools catch misuse.

## Individual Comments

### Comment 1
<location path="astrbot/core/utils/image_caption_cache.py" line_range="72-81" />
<code_context>
+            )
+            return cached_caption
+
+        lock = self._locks.setdefault(cache_key, asyncio.Lock())
+        async with lock:
+            cached_caption = self._get(cache_key)
+            if cached_caption is not None:
+                logger.debug(
+                    "Using cached image caption after lock wait. provider=%s",
+                    provider_id or "<default>",
+                )
+                return cached_caption
+
+            caption = await caption_factory()
+            self._entries[cache_key] = _ImageCaptionCacheEntry(
+                caption=caption,
+                expires_at=time.monotonic() + ttl_seconds,
+            )
+            self._cleanup_expired_entries()
+            return caption
+
</code_context>
<issue_to_address>
**issue (bug_risk):** Locks dictionary can grow without bound, leading to potential memory leaks over long runtimes.

`_cleanup_expired_entries` only prunes `_entries`; `_locks` still retains a lock per cache key indefinitely. In long-lived processes with many unique keys this can cause unbounded growth. Consider removing `self._locks[cache_key]` when its entry expires, or periodically pruning locks for keys no longer present in `_entries`.
</issue_to_address>

### Comment 2
<location path="astrbot/builtin_stars/astrbot/group_chat_context.py" line_range="91-93" />
<code_context>
         image_caption_prompt: str,
+        cache_ttl: int = 0,
     ) -> str:
         if not image_caption_provider_id:
             provider = self.context.get_using_provider()
+            provider_id = (
+                provider.provider_config.get("id", "")
+                if isinstance(provider, Provider)
</code_context>
<issue_to_address>
**issue (bug_risk):** Caching with an empty or non-unique provider ID can mix captions from different default providers.

When `image_caption_provider_id` is falsy, `provider_id` comes from `provider.provider_config.get("id", "")`. If this ever resolves to `""` or a non-unique value while different providers are used over time, cached captions can be incorrectly shared across providers. Consider enforcing a stable, non-empty, unique identifier (e.g., required `id` in config or a fallback like the provider class name) to ensure cache keys stay provider-specific.
</issue_to_address>

### Comment 3
<location path="astrbot/core/utils/image_caption_cache.py" line_range="16" />
<code_context>
+DEFAULT_IMAGE_CAPTION_CACHE_TTL = 600
+
+
+def resolve_image_caption_cache_ttl(config: dict | None) -> int:
+    if not isinstance(config, dict):
+        return DEFAULT_IMAGE_CAPTION_CACHE_TTL
</code_context>
<issue_to_address>
**issue (complexity):** Consider simplifying TTL handling and lock lifecycle management to reduce branching and prevent unbounded growth of per-key locks while preserving existing behavior.

You can keep all current behavior but trim some complexity and lifecycle issues with small, focused changes.

### 1. Clean up per-key locks when entries expire

Right now `_locks` can grow without bound because locks are never removed. You can piggyback on `_cleanup_expired_entries` to prune locks for expired keys:

```python
def _cleanup_expired_entries(self) -> None:
    now = time.monotonic()
    expired_keys = [
        key for key, entry in self._entries.items() if entry.expires_at <= now
    ]
    for key in expired_keys:
        self._entries.pop(key, None)
        self._locks.pop(key, None)  # keep locks in sync with entries
```

You can also simplify lock lookup using `defaultdict` to avoid `setdefault` noise:

```python
from collections import defaultdict

class ImageCaptionCache:
    def __init__(self) -> None:
        self._entries: dict[str, _ImageCaptionCacheEntry] = {}
        self._locks: defaultdict[str, asyncio.Lock] = defaultdict(asyncio.Lock)

    async def get_or_create(...):
        ...
        lock = self._locks[cache_key]
        async with lock:
            ...
```

### 2. Simplify TTL resolution while preserving behavior

You can keep the current type semantics (including the `bool` special-case) but make the logic tighter and easier to read:

```python
def resolve_image_caption_cache_ttl(config: dict | None) -> int:
    raw = (config or {}).get(
        "image_caption_cache_ttl",
        DEFAULT_IMAGE_CAPTION_CACHE_TTL,
    )

    # Preserve bool special-case
    if isinstance(raw, bool):
        return DEFAULT_IMAGE_CAPTION_CACHE_TTL

    try:
        return max(int(raw), 0)
    except (TypeError, ValueError):
        return DEFAULT_IMAGE_CAPTION_CACHE_TTL
```

This reduces branching and makes the accepted types clearer while keeping the existing behavior intact.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread astrbot/core/utils/image_caption_cache.py Outdated
Comment thread astrbot/builtin_stars/astrbot/group_chat_context.py Outdated
Comment thread astrbot/core/utils/image_caption_cache.py
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an image caption caching mechanism (ImageCaptionCache) to store and reuse vision recognition results, preventing redundant API calls when the same image is processed within a configurable TTL (image_caption_cache_ttl). This caching logic is integrated into both group chat contexts and the main agent, with configuration options and translations updated accordingly. The review feedback highlights a potential memory leak in the cache implementation where lock objects in self._locks are never cleaned up, and suggests a finally block to safely remove locks after caption generation.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread astrbot/core/utils/image_caption_cache.py
@FloranceYeh
Copy link
Copy Markdown
Author

@sourcery-ai review

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 3 issues, and left some high level feedback:

  • In tests/test_group_chat_context.py you import Provider from astrbot.core.provider, but the production code uses astrbot.api.provider.Provider; aligning the import to the actual implementation will avoid spec mismatches and future refactors breaking these tests.
  • In ImageCaptionCache.get_or_create, caption_factory is untyped and unconstrained; consider typing it as a Callable[[], Awaitable[str]] (and enforcing that signature at call sites) to make misuse easier to catch and improve editor/static-analysis support.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `tests/test_group_chat_context.py` you import `Provider` from `astrbot.core.provider`, but the production code uses `astrbot.api.provider.Provider`; aligning the import to the actual implementation will avoid spec mismatches and future refactors breaking these tests.
- In `ImageCaptionCache.get_or_create`, `caption_factory` is untyped and unconstrained; consider typing it as a `Callable[[], Awaitable[str]]` (and enforcing that signature at call sites) to make misuse easier to catch and improve editor/static-analysis support.

## Individual Comments

### Comment 1
<location path="astrbot/core/utils/image_caption_cache.py" line_range="50-57" />
<code_context>
+        self._entries.clear()
+        self._locks.clear()
+
+    async def get_or_create(
+        self,
+        *,
+        provider_id: str,
+        prompt: str,
+        image_urls: list[str],
+        ttl_seconds: int,
+        caption_factory,
+    ) -> str:
+        if ttl_seconds <= 0:
</code_context>
<issue_to_address>
**suggestion (bug_risk):** The `caption_factory` parameter should be typed/validated as an async callable for clarity and safety.

Because `caption_factory` is untyped and assumed to be async returning `str`, passing a sync function will cause `await caption_factory()` to fail at runtime. Consider typing it as `Callable[[], Awaitable[str]]` (or otherwise documenting/enforcing that it must be async), and optionally adding a small runtime check (e.g. `inspect.iscoroutinefunction`) to fail fast with a clear error message.

Suggested implementation:

```python
import inspect
from collections.abc import Awaitable, Callable


class ImageCaptionCache:

```

```python
        ttl_seconds: int,
        caption_factory: Callable[[], Awaitable[str]],
    ) -> str:
        if not inspect.iscoroutinefunction(caption_factory):
            raise TypeError(
                "caption_factory must be an async callable (defined with 'async def') "
                "that returns a str"
            )

        if ttl_seconds <= 0:
            return await caption_factory()

```
</issue_to_address>

### Comment 2
<location path="astrbot/core/utils/image_caption_cache.py" line_range="41" />
<code_context>
+    users: int = 0
+
+
+class ImageCaptionCache:
+    def __init__(self) -> None:
+        self._entries: dict[str, _ImageCaptionCacheEntry] = {}
</code_context>
<issue_to_address>
**issue (complexity):** Consider refactoring the cache to keep most logic synchronous, simplify per-key locking, and split image fingerprinting into small helpers to reduce cognitive and control-flow complexity.

You can significantly reduce cognitive and control‑flow complexity without changing behavior by tightening the async surface and simplifying lock management / helpers.

### 1. Make cache key building sync and isolate the only blocking I/O

Only the file read needs a thread; everything else is pure CPU/string work. You can keep `get_or_create` async (because `caption_factory` is async) and make `_build_cache_key` sync, calling a small async wrapper only when you actually hit the filesystem.

```python
async def get_or_create(...):
    ...
    cache_key = await asyncio.to_thread(
        self._build_cache_key,
        provider_id,
        prompt,
        image_urls,
    )
    ...
```

```python
def _build_cache_key(
    self,
    provider_id: str,
    prompt: str,
    image_urls: list[str],
) -> str:
    image_fingerprints = [
        self._fingerprint_image_sync(image_url) for image_url in image_urls
    ]
    joined = "\n".join([provider_id, prompt, *image_fingerprints])
    return hashlib.sha256(joined.encode("utf-8")).hexdigest()
```

```python
def _fingerprint_image_sync(self, image_url: str) -> str:
    if image_url.startswith("base64://"):
        ...
    if image_url.startswith("data:image"):
        ...
    if image_url.startswith(("http://", "https://")):
        return f"url:{image_url}"

    local_path = self._to_local_path(image_url)
    if local_path and local_path.is_file():
        # only this uses blocking I/O
        image_bytes = local_path.read_bytes()
        return self._hash_bytes(image_bytes)

    return f"ref:{image_url}"
```

If you want to keep file I/O off the event loop, push only that part to a thread:

```python
def _fingerprint_image_sync(self, image_url: str) -> str:
    ...
    local_path = self._to_local_path(image_url)
    if local_path and local_path.is_file():
        return f"file:{local_path}"
    return f"ref:{image_url}"
```

```python
async def _fingerprint_file(self, path: Path) -> str:
    image_bytes = await asyncio.to_thread(path.read_bytes)
    return self._hash_bytes(image_bytes)
```

Then use `_fingerprint_file` only when you detect a `file:` fingerprint marker. This keeps most logic sync, with a very small async boundary around reading bytes.

### 2. Simplify lock management by dropping `users` and the custom entry type

You don’t need reference counting; a simple per‑key lock is enough and easier to reason about. This keeps behavior (per‑key concurrency) intact.

```python
class ImageCaptionCache:
    def __init__(self) -> None:
        self._entries: dict[str, _ImageCaptionCacheEntry] = {}
        self._locks: dict[str, asyncio.Lock] = {}

    def _get_lock(self, cache_key: str) -> asyncio.Lock:
        lock = self._locks.get(cache_key)
        if lock is None:
            lock = asyncio.Lock()
            self._locks[cache_key] = lock
        return lock
```

```python
async def get_or_create(...):
    ...
    lock = self._get_lock(cache_key)
    async with lock:
        cached_caption = self._get(cache_key)
        if cached_caption is not None:
            ...
        caption = await caption_factory()
        self._entries[cache_key] = _ImageCaptionCacheEntry(
            caption=caption,
            expires_at=time.monotonic() + ttl_seconds,
        )
        self._cleanup_expired_entries()
        return caption
```

You can optionally prune `_locks` when cleaning `_entries`:

```python
def _cleanup_expired_entries(self) -> None:
    now = time.monotonic()
    expired_keys = [k for k, e in self._entries.items() if e.expires_at <= now]
    for key in expired_keys:
        self._entries.pop(key, None)
        self._locks.pop(key, None)  # prune matching locks
```

No `users` counter, no `_ImageCaptionCacheLockEntry`, and lock lifecycle is tied directly to cache keys.

### 3. Split `_fingerprint_image` into small helpers for readability

Functionally identical but much easier to scan and test in isolation:

```python
def _fingerprint_image_sync(self, image_url: str) -> str:
    if image_url.startswith("base64://"):
        return self._fingerprint_base64(image_url)
    if image_url.startswith("data:image"):
        return self._fingerprint_data_uri(image_url)
    if image_url.startswith(("http://", "https://")):
        return f"url:{image_url}"

    local_path = self._to_local_path(image_url)
    if local_path and local_path.is_file():
        return self._fingerprint_file_marker(local_path)

    return f"ref:{image_url}"
```

```python
def _fingerprint_base64(self, image_url: str) -> str:
    raw_base64 = image_url.removeprefix("base64://")
    try:
        image_bytes = base64.b64decode(raw_base64)
    except Exception:
        return f"ref:{image_url}"
    return self._hash_bytes(image_bytes)

def _fingerprint_data_uri(self, image_url: str) -> str:
    try:
        _, encoded = image_url.split(",", 1)
        image_bytes = base64.b64decode(encoded)
    except Exception:
        return f"ref:{image_url}"
    return self._hash_bytes(image_bytes)
```

This keeps all current variants and hashing behavior but flattens the branching into self‑documenting helpers, reducing the perceived complexity of the main cache logic.
</issue_to_address>

### Comment 3
<location path="astrbot/builtin_stars/astrbot/group_chat_context.py" line_range="274" />
<code_context>
     return GROUP_HISTORY_HEADER + "\n".join(records) + GROUP_HISTORY_FOOTER
+
+
+def _resolve_provider_cache_identity(
+    provider: Provider,
+    configured_provider_id: str,
</code_context>
<issue_to_address>
**issue (complexity):** Consider simplifying the provider identity resolution and moving the caption factory into a separate helper method to make the caching logic easier to read and maintain.

You can keep the new behavior but trim complexity in two focused places:

---

### 1. Simplify `_resolve_provider_cache_identity`

You don’t really need JSON serialization + SHA256 for a reasonably unique cache key. A deterministic string based on class, type and model is much easier to follow and still avoids collisions in realistic setups.

```python
def _resolve_provider_cache_identity(
    provider: Provider,
    configured_provider_id: str,
) -> str:
    # Respect explicit configuration
    if configured_provider_id:
        return configured_provider_id

    # Reuse provider-configured ID if present
    provider_id = provider.provider_config.get("id")
    if isinstance(provider_id, str) and provider_id:
        return provider_id

    # Fallback: stable, human-readable identity
    return ":".join(
        [
            provider.__class__.__module__,
            provider.__class__.__qualname__,
            provider.provider_config.get("type", ""),
            provider.get_model() or "",
        ]
    )
```

This keeps the “different providers/configs get different keys” property in practice (since class + type + model usually distinguish them), but removes JSON dumping, hashing and the extra payload structure.

If you really need a config-sensitive key, you can still keep it simple by adding a *small* subset of config fields instead of the full dict.

---

### 2. Remove the nested `_caption_factory` closure

The nested async function adds indirection and captures multiple variables. You can move it into a small private method and use a lightweight lambda/partial when calling the cache.

```python
async def _fetch_image_caption(
    self,
    provider: Provider,
    prompt: str,
    image_url: str,
) -> str:
    response = await provider.text_chat(
        prompt=prompt,
        session_id=uuid.uuid4().hex,
        image_urls=[image_url],
        persist=False,
    )
    return response.completion_text
```

Then `get_image_caption` becomes:

```python
async def get_image_caption(
    self,
    image_url: str,
    image_caption_provider_id: str,
    image_caption_prompt: str,
    cache_ttl: int = 0,
) -> str:
    if not image_caption_provider_id:
        provider = self.context.get_using_provider()
    else:
        provider = self.context.get_provider_by_id(image_caption_provider_id)
        if not provider:
            raise Exception(f"Provider `{image_caption_provider_id}` was not found.")

    if not isinstance(provider, Provider):
        raise Exception(
            f"Provider type is invalid for image captioning: {type(provider)}."
        )

    provider_id = _resolve_provider_cache_identity(
        provider,
        configured_provider_id=image_caption_provider_id,
    )

    return await image_caption_cache.get_or_create(
        provider_id=provider_id,
        prompt=image_caption_prompt,
        image_urls=[image_url],
        ttl_seconds=cache_ttl,
        caption_factory=lambda: self._fetch_image_caption(
            provider,
            image_caption_prompt,
            image_url,
        ),
    )
```

Behavior is unchanged (same cache call, same parameters, same provider usage), but the control flow is flatter and easier to scan.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread astrbot/core/utils/image_caption_cache.py Outdated
Comment thread astrbot/core/utils/image_caption_cache.py
Comment thread astrbot/builtin_stars/astrbot/group_chat_context.py
@FloranceYeh
Copy link
Copy Markdown
Author

@sourcery-ai review

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues

Prompt for AI Agents
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location path="astrbot/core/astr_main_agent.py" line_range="863-868" />
<code_context>
     req.extra_user_content_parts.append(TextPart(text=quoted_text))


+class _QuotedImageCaptionContext:
+    def __init__(self, provider: Provider) -> None:
+        self._provider = provider
+
+    def get_provider_by_id(self, provider_id: str) -> Provider:
+        return self._provider
+
</code_context>
<issue_to_address>
**suggestion (bug_risk):** `_QuotedImageCaptionContext.get_provider_by_id` ignores `provider_id`, which may be confusing for future usage.

Here `get_provider_by_id` ignores its `provider_id` argument and always returns the same provider, which breaks the usual expectation for this kind of method. Consider either checking that `provider_id` matches the wrapped provider and raising otherwise, or renaming/adjusting the method to clearly reflect that only a single provider is supported.

```suggestion
class _QuotedImageCaptionContext:
    def __init__(self, provider: Provider) -> None:
        self._provider = provider

    def get_provider_by_id(self, provider_id: str) -> Provider:
        """
        Return the wrapped provider if the requested provider_id matches it.

        This mirrors the usual expectation of a `get_provider_by_id` method by
        enforcing that only the wrapped provider is addressable via its id.

        Raises:
            ValueError: if `provider_id` does not match the wrapped provider.
        """
        wrapped_id = getattr(self._provider, "id", None)
        if wrapped_id is not None and provider_id != wrapped_id:
            raise ValueError(
                f"Requested provider_id '{provider_id}' does not match "
                f"wrapped provider id '{wrapped_id}'."
            )
        return self._provider
```
</issue_to_address>

### Comment 2
<location path="astrbot/core/utils/image_caption_cache.py" line_range="43" />
<code_context>
+    users: int = 0
+
+
+class ImageCaptionCache:
+    def __init__(self) -> None:
+        self._entries: dict[str, _ImageCaptionCacheEntry] = {}
</code_context>
<issue_to_address>
**issue (complexity):** Consider simplifying the cache’s concurrency and factory-calling logic by using plain per-key locks and directly awaiting the caption factory to reduce state and indirection.

You can simplify this without losing any behavior by collapsing the locking model and dropping the reflection around `caption_factory`.

### 1. Simplify per-key locking (drop `users` bookkeeping)

The `users` counter and lifecycle management of `_ImageCaptionCacheLockEntry` add state and branches that don’t materially change behavior for an in-memory cache. You can keep per-key locks and just let them live for the process lifetime.

Replace the lock entry dataclass and methods with a simple `dict[str, asyncio.Lock]`:

```python
@dataclass(slots=True)
class _ImageCaptionCacheEntry:
    caption: str
    expires_at: float

class ImageCaptionCache:
    def __init__(self) -> None:
        self._entries: dict[str, _ImageCaptionCacheEntry] = {}
        self._locks: dict[str, asyncio.Lock] = {}

    def clear(self) -> None:
        self._entries.clear()
        self._locks.clear()

    def _get_lock(self, cache_key: str) -> asyncio.Lock:
        lock = self._locks.get(cache_key)
        if lock is None:
            lock = asyncio.Lock()
            self._locks[cache_key] = lock
        return lock
```

Then simplify `get_or_create` to:

```python
async def get_or_create(
    self,
    *,
    provider_id: str,
    prompt: str,
    image_urls: list[str],
    ttl_seconds: int,
    caption_factory: Callable[[], Awaitable[str]],
) -> str:
    if ttl_seconds <= 0:
        return await caption_factory()

    cache_key = await self._build_cache_key(
        provider_id=provider_id,
        prompt=prompt,
        image_urls=image_urls,
    )
    cached_caption = self._get(cache_key)
    if cached_caption is not None:
        logger.debug("Using cached image caption. provider=%s", provider_id or "<default>")
        return cached_caption

    lock = self._get_lock(cache_key)
    async with lock:
        cached_caption = self._get(cache_key)
        if cached_caption is not None:
            logger.debug(
                "Using cached image caption after lock wait. provider=%s",
                provider_id or "<default>",
            )
            return cached_caption

        caption = await caption_factory()
        self._entries[cache_key] = _ImageCaptionCacheEntry(
            caption=caption,
            expires_at=time.monotonic() + ttl_seconds,
        )
        self._cleanup_expired_entries()
        return caption
```

And `_cleanup_expired_entries` no longer needs to touch `_locks`:

```python
def _cleanup_expired_entries(self) -> None:
    now = time.monotonic()
    expired_keys = [
        key for key, entry in self._entries.items() if entry.expires_at <= now
    ]
    for key in expired_keys:
        self._entries.pop(key, None)
    # Locks remain; they are small and safe to keep for process lifetime.
```

This removes `_ImageCaptionCacheLockEntry`, `_acquire_lock_entry`, `_release_lock_entry`, and the subtle coupling between `_locks` and TTL, while keeping the “avoid thundering herd per key” property.

### 2. Remove `_invoke_caption_factory` and runtime reflection

If you already type `caption_factory: Callable[[], Awaitable[str]]`, you can rely on that contract and tests instead of checking `inspect.isawaitable` at runtime. This makes the call path shorter and easier to read.

You can delete `_invoke_caption_factory` and directly `await` the factory:

```python
# Before
if ttl_seconds <= 0:
    return await self._invoke_caption_factory(caption_factory)
...
caption = await self._invoke_caption_factory(caption_factory)

# After
if ttl_seconds <= 0:
    return await caption_factory()
...
caption = await caption_factory()
```

And drop this method entirely:

```python
# Remove this
async def _invoke_caption_factory(
    self,
    caption_factory: Callable[[], Awaitable[str]],
) -> str:
    result = caption_factory()
    if not inspect.isawaitable(result):
        raise TypeError(
            "caption_factory must be callable and return an awaitable resolving to str"
        )
    return await result
```

This keeps the external API as-is but reduces indirection and reflection, making the control flow and error surface more straightforward.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread astrbot/core/astr_main_agent.py Outdated
Comment thread astrbot/core/utils/image_caption_cache.py
@FloranceYeh
Copy link
Copy Markdown
Author

@sourcery-ai review

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • The cache key currently includes the prompt string, which prevents reuse of captions for the same image across different prompts; if the intended behavior is to deduplicate purely on image + provider as described in the PR, consider removing prompt from _build_cache_key (or updating the description if prompt-sensitive caching is desired).
  • ImageCaptionCache stores one asyncio.Lock per cache key in _locks but never removes them, so under many unique images this map can grow unbounded; consider cleaning up locks when entries expire or after use to avoid long-lived memory growth.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The cache key currently includes the prompt string, which prevents reuse of captions for the same image across different prompts; if the intended behavior is to deduplicate purely on image + provider as described in the PR, consider removing `prompt` from `_build_cache_key` (or updating the description if prompt-sensitive caching is desired).
- `ImageCaptionCache` stores one `asyncio.Lock` per cache key in `_locks` but never removes them, so under many unique images this map can grow unbounded; consider cleaning up locks when entries expire or after use to avoid long-lived memory growth.

## Individual Comments

### Comment 1
<location path="astrbot/core/astr_main_agent.py" line_range="866-873" />
<code_context>
+class _QuotedImageCaptionContext:
+    def __init__(self, provider: Provider) -> None:
+        self._provider = provider
+    def get_provider_by_id(self, provider_id: str) -> Provider:
+        wrapped_id = getattr(self._provider, "id", None)
+        if wrapped_id is not None and provider_id != wrapped_id:
+            raise ValueError(
+                f"Requested provider_id '{provider_id}' does not match "
+                f"wrapped provider id '{wrapped_id}'."
+            )
+        return self._provider
+
 def _append_system_reminders(
</code_context>
<issue_to_address>
**issue (bug_risk):** Avoid hard failure when provider_id does not match wrapped provider id in quoted image captions.

The strict `provider_id` vs `wrapped_id` check in `_QuotedImageCaptionContext.get_provider_by_id` raises `ValueError` even when we already have the correct provider instance. In `_process_quote_message`, `provider_id` comes from `prov.provider_config.get("id", img_cap_prov_id or "")`, which may differ from `getattr(prov, "id", None)` (e.g., one unset or different naming). Because `_process_quote_message` doesn’t catch `ValueError`, this can abort quoted message handling. Consider relaxing this to only fail when both IDs are set and differ, or just log a warning and return the provider, so harmless mismatches don’t break captioning.
</issue_to_address>

### Comment 2
<location path="astrbot/core/utils/image_caption_cache.py" line_range="97-103" />
<code_context>
+        for key in expired_keys:
+            self._entries.pop(key, None)
+
+    def _get_lock(self, cache_key: str) -> asyncio.Lock:
+        lock = self._locks.get(cache_key)
+        if lock is None:
+            lock = asyncio.Lock()
+            self._locks[cache_key] = lock
+        return lock
+
+    async def _build_cache_key(
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Cache locks dict can grow without bound and should be pruned similarly to entries.

`_locks` adds a lock per cache key but never removes them, even after `_entries` are pruned. With many distinct keys, this can cause unbounded growth and a slow memory leak. Please also prune `_locks` (e.g., alongside `_cleanup_expired_entries`, or by periodically dropping locks whose keys are no longer in `_entries`, or by using a structure like `WeakValueDictionary` that ties lock lifetime to the entry).

```suggestion
    def _cleanup_expired_entries(self) -> None:
        now = time.monotonic()
        expired_keys = [
            key for key, entry in self._entries.items() if entry.expires_at <= now
        ]
        for key in expired_keys:
            # Drop both the cached entry and its associated lock to avoid unbounded growth
            self._entries.pop(key, None)
            self._locks.pop(key, None)
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread astrbot/core/astr_main_agent.py Outdated
Comment thread astrbot/core/utils/image_caption_cache.py
@FloranceYeh
Copy link
Copy Markdown
Author

@sourcery-ai review

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • In ImageCaptionCache._to_local_path, the Windows drive-letter detection logic (parsed_path.startswith('/') and parsed_path[2] == ':') is a bit brittle; consider using pathlib.PureWindowsPath or a more explicit platform-aware check to avoid mis-parsing edge-case file:// URLs.
  • In ImageCaptionCache.get_or_create, you only log when a cached caption is used; adding a debug log on cache misses (and possibly when entries expire) would make it easier to understand cache behavior and diagnose unexpected caption-provider traffic in production.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `ImageCaptionCache._to_local_path`, the Windows drive-letter detection logic (`parsed_path.startswith('/') and parsed_path[2] == ':'`) is a bit brittle; consider using `pathlib.PureWindowsPath` or a more explicit platform-aware check to avoid mis-parsing edge-case `file://` URLs.
- In `ImageCaptionCache.get_or_create`, you only log when a cached caption is used; adding a debug log on cache misses (and possibly when entries expire) would make it easier to understand cache behavior and diagnose unexpected caption-provider traffic in production.

## Individual Comments

### Comment 1
<location path="astrbot/core/astr_main_agent.py" line_range="827" />
<code_context>
                 if path and _is_generated_compressed_image_path(path, compress_path):
                     event.track_temporary_local_file(compress_path)
-                llm_resp = await prov.text_chat(
+                caption = await _request_img_caption(
+                    prov.provider_config.get("id", img_cap_prov_id or ""),
+                    {
</code_context>
<issue_to_address>
**issue (complexity):** Consider extracting a provider-centric caption helper and having `_request_img_caption` and the quote path delegate to it instead of building synthetic configs and contexts.

The complexity here mainly comes from shaping data to fit `_request_img_caption` instead of letting `_request_img_caption` accept what the caller already has.

You can keep all behavior (including caching) while simplifying by:

1. Introducing a small helper that works directly with a `Provider` and explicit `prompt`/`cache_ttl`.
2. Letting the existing `_request_img_caption` delegate to that helper.
3. Removing `_QuotedImageCaptionContext` and the synthetic `cfg` dict at the quote call site.

### 1. Extract a provider‑centric helper

```python
async def _request_img_caption_with_provider(
    prov: Provider,
    provider_id: str,
    image_urls: list[str],
    prompt: str,
    cache_ttl: int | None = None,
) -> str:
    if cache_ttl is None:
        # Fall back to provider_config or some global default if needed
        cache_ttl = resolve_image_caption_cache_ttl(prov.provider_config)

    async def _caption_factory() -> str:
        llm_resp = await prov.text_chat(
            prompt=prompt,
            image_urls=image_urls,
        )
        return llm_resp.completion_text

    return await image_caption_cache.get_or_create(
        provider_id=provider_id,
        prompt=prompt,
        image_urls=image_urls,
        ttl_seconds=cache_ttl,
        caption_factory=_caption_factory,
    )
```

Then keep your existing entry point but make it a thin wrapper:

```python
async def _request_img_caption(
    provider_id: str,
    cfg: dict,
    image_urls: list[str],
    plugin_context: Context,
    prompt: str | None = None,
) -> str:
    prov = plugin_context.get_provider_by_id(provider_id)
    if prov is None:
        raise ValueError(
            f"Cannot get image caption because provider `{provider_id}` is not exist.",
        )
    if not isinstance(prov, Provider):
        raise ValueError(
            f"Cannot get image caption because provider `{provider_id}` is not a valid Provider, it is {type(prov)}.",
        )

    img_cap_prompt = prompt or cfg.get(
        "image_caption_prompt",
        "Please describe the image.",
    )
    cache_ttl = resolve_image_caption_cache_ttl(cfg)

    return await _request_img_caption_with_provider(
        prov=prov,
        provider_id=provider_id,
        image_urls=image_urls,
        prompt=img_cap_prompt,
        cache_ttl=cache_ttl,
    )
```

This keeps the caching logic in one place and reduces `_request_img_caption` to mostly plumbing.

### 2. Simplify the quoted‑message call site

Once you have `_request_img_caption_with_provider`, the quoted‑message path no longer needs `_QuotedImageCaptionContext` or the synthetic `cfg` dict:

```python
if prov and isinstance(prov, Provider):
    path = await image_seg.convert_to_file_path()
    compress_path = await _compress_image_for_provider(
        path,
        config.provider_settings if config else None,
    )
    if path and _is_generated_compressed_image_path(path, compress_path):
        event.track_temporary_local_file(compress_path)

    caption_prompt = "Please describe the image content."
    cache_ttl = resolve_image_caption_cache_ttl(
        config.provider_settings if config else None
    )

    caption = await _request_img_caption_with_provider(
        prov=prov,
        provider_id=prov.provider_config.get("id", img_cap_prov_id or ""),
        image_urls=[compress_path],
        prompt=caption_prompt,
        cache_ttl=cache_ttl,
    )
    if caption:
        content_parts.append(
            f"[Image Caption in quoted message]: {caption}"
        )
```

With this:

- `_QuotedImageCaptionContext` can be deleted entirely.
- The quote path no longer has to know about the internal shape of `cfg["image_caption_prompt"]` / `cfg["image_caption_cache_ttl"]` — it just passes prompt + TTL explicitly.
- Caching stays centralized and consistent.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread astrbot/core/astr_main_agent.py Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. area:webui The bug / feature is about webui(dashboard) of astrbot. size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant