Skip to content

Cache invalid-workspace alias inference per storage fingerprint (#116)#125

Open
clean6378-max-it wants to merge 4 commits into
masterfrom
perf/cache-invalid-workspace-aliases
Open

Cache invalid-workspace alias inference per storage fingerprint (#116)#125
clean6378-max-it wants to merge 4 commits into
masterfrom
perf/cache-invalid-workspace-aliases

Conversation

@clean6378-max-it

@clean6378-max-it clean6378-max-it commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Add mtime-keyed disk cache for invalid_workspace_aliases in summary_cache.py, using the same storage fingerprint invalidation flow as composer-map and tab-summary caches.
  • Expose cached alias resolution on WorkspaceContext via resolve_invalid_workspace_aliases_cached() and with_invalid_workspace_aliases().
  • Update assemble_single_tab, tab summaries, and workspace listing to read precomputed aliases instead of re-scanning all composerData:* rows on every request.

Problem

When invalid_workspace_ids is non-empty, hot paths such as assemble_single_tab ran a full global composerData:* scan on every deep-link request solely to build invalid_workspace_aliases. On large state.vscdb files this produced O(tabs × keys) I/O.

Solution

Compute invalid_workspace_aliases once per storage fingerprint and serve subsequent requests from the disk cache until the fingerprint changes (DB replaced or storage mtimes change).

Test plan

  • pytest tests/test_workspace_tabs_summary.py tests/test_workspace_list_count_alignment.py -q
  • Full pytest -q (532 passed, 21 skipped)
  • mypy .
  • Unit tests: cache hit, cache miss/invalidation, empty-alias case (all workspaces valid)
  • assemble_single_tab no longer runs global composerData:% scan on cache hit when invalid workspaces exist

Out of scope

  • Search index / live-scan paths (search.py) — deferred per operator
  • Changing alias vote/tie-break semantics in infer_invalid_workspace_aliases

Closes #116

Summary by CodeRabbit

  • New Features

    • Added disk-backed caching for invalid workspace alias resolution to speed up workspace listings, tab summaries, and export flows.
    • Introduced a nocache option that bypasses the alias disk cache for workspace tab/listing and per-tab routes, and for export orchestration.
  • Bug Fixes

    • Cached alias results are reused only when the workspace/storage fingerprint matches and are refreshed when global storage changes.
    • Malformed or corrupt cached alias data is safely ignored to prevent incorrect alias assignments.

Compute invalid_workspace_aliases once per mtime-keyed fingerprint and
reuse it in assemble_single_tab, tab summaries, and workspace listing
instead of re-scanning all composerData rows on every request.
@coderabbitai

coderabbitai Bot commented Jun 29, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7e49fb3a-93fa-4eb4-8fe0-6de728517053

📥 Commits

Reviewing files that changed from the base of the PR and between f2c7663 and 3789375.

📒 Files selected for processing (3)
  • api/workspaces.py
  • services/workspace_tabs.py
  • tests/test_workspace_listing_performance.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/test_workspace_listing_performance.py

📝 Walkthrough

Walkthrough

Adds disk-backed caching for invalid workspace alias mappings. WorkspaceContext now carries cached alias state, workspace listing/tab/export paths use the cached resolver, and summary cache helpers persist and validate alias mappings by fingerprint.

Changes

Invalid workspace alias caching

Layer / File(s) Summary
Cache storage helpers
services/summary_cache.py, tests/test_summary_cache.py
Adds the invalid-workspace-alias cache file path and get/set helpers with fingerprint and string-value validation, plus cache hit/miss and invalid-data tests.
WorkspaceContext resolver
services/workspace_context.py, tests/test_workspace_context.py
Adds the nullable alias field, cached resolution helper, context enrichment helper, and tests for empty, cached, invalidated, and fast-path resolution.
Listing and tab assembly
api/workspaces.py, services/workspace_listing.py, services/workspace_tabs.py, tests/test_workspace_tabs_summary.py, tests/test_workspace_listing_performance.py
Replaces inline alias inference with the cached resolver in listing and tab assembly, threads nocache through API and listing paths, and updates tab/performance tests for cache-hit behavior.
Export alias resolution
services/export_engine.py
Routes export loading through the cached invalid-workspace alias resolver and passes rules and nocache into export orchestration.
Search assigner note
services/search.py
Adds comments describing the continued composerData cold-scan behavior and deferred cache sharing.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • bradjin8
  • wpak-ai

Poem

🐇 I cached the aliases, soft and neat,
No giant scan on every deep-link beat.
A fingerprint winked, the disk said “hooray,”
And tab loads hopped much quicker today.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 45.95% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title is concise and accurately summarizes the main change: caching invalid-workspace alias inference by storage fingerprint.
Linked Issues check ✅ Passed The PR adds fingerprinted alias caching, exposes it on WorkspaceContext, updates hot paths, and adds tests covering invalidation and cache hits.
Out of Scope Changes check ✅ Passed The changes stay focused on alias-cache plumbing, nocache pass-through, and tests; no unrelated feature work stands out.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch perf/cache-invalid-workspace-aliases

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@services/summary_cache.py`:
- Around line 242-261: The cached alias loader in
get_cached_invalid_workspace_aliases is accepting malformed entries and
converting them with str(), which can turn bad cache values into bogus aliases.
Update the alias validation in this function to strictly require
string-to-string mappings, rejecting any entry whose key or value is not already
a string so the function returns None and recomputes instead of using corrupt
cache data.

In `@services/workspace_context.py`:
- Around line 157-160: The fast path in WorkspaceContext is not honoring
precomputed invalid_workspace_aliases and only short-circuits on
invalid_workspace_ids, which can still trigger fingerprint recomputation and DB
rescans. Update infer_invalid_workspace_aliases and any related accessors to
first return the already-populated invalid_workspace_aliases from
WorkspaceContext, keeping with_invalid_workspace_aliases as the source of that
enriched state. Ensure the cache/lookup path consistently prefers the
precomputed aliases before falling back to storage fingerprint checks or
scanning composer rows.

In `@services/workspace_listing.py`:
- Around line 127-133: Preserve the effective nocache behavior when resolving
invalid workspace aliases: `list_workspace_projects(nocache=True)` currently
still lets `resolve_invalid_workspace_aliases_cached` use the alias disk cache
because `_build_workspace_projects_uncached` does not pass the flag through.
Thread the effective nocache value into `_build_workspace_projects_uncached`,
then pass it into `resolve_invalid_workspace_aliases_cached` so the helper
respects the caller’s cache policy.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6130cf01-64e7-4fca-b1a3-261b4cddc45d

📥 Commits

Reviewing files that changed from the base of the PR and between fb97d03 and 0f896e4.

📒 Files selected for processing (7)
  • services/summary_cache.py
  • services/workspace_context.py
  • services/workspace_listing.py
  • services/workspace_tabs.py
  • tests/test_summary_cache.py
  • tests/test_workspace_context.py
  • tests/test_workspace_tabs_summary.py

Comment thread services/summary_cache.py Outdated
Comment thread services/workspace_context.py
Comment thread services/workspace_listing.py

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/test_workspace_listing_performance.py (1)

140-145: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Assert alias cache writes are bypassed too.

Line 140 only patches get_cached_invalid_workspace_aliases, so a regression that still writes alias results under nocache=True would pass. Patch set_cached_invalid_workspace_aliases as well.

Proposed test tightening
-        with patch(
-            "services.summary_cache.get_cached_invalid_workspace_aliases",
-        ) as mock_get:
+        with (
+            patch("services.summary_cache.get_cached_invalid_workspace_aliases") as mock_get,
+            patch("services.summary_cache.set_cached_invalid_workspace_aliases") as mock_set,
+        ):
             mock_get.return_value = {"invalid-ws": "global"}
             list_workspace_projects(ws_path, rules=[], nocache=True)
         mock_get.assert_not_called()
+        mock_set.assert_not_called()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_workspace_listing_performance.py` around lines 140 - 145, The
nocache test around list_workspace_projects only verifies that
get_cached_invalid_workspace_aliases is not read, so it can miss regressions
where alias results are still written. Update the test to also patch and assert
set_cached_invalid_workspace_aliases is not called when nocache=True, alongside
the existing mock_get check, so the cache bypass is covered end-to-end.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/test_workspace_listing_performance.py`:
- Around line 140-145: The nocache test around list_workspace_projects only
verifies that get_cached_invalid_workspace_aliases is not read, so it can miss
regressions where alias results are still written. Update the test to also patch
and assert set_cached_invalid_workspace_aliases is not called when nocache=True,
alongside the existing mock_get check, so the cache bypass is covered
end-to-end.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c9dc800f-dae2-491f-981d-21ff73895073

📥 Commits

Reviewing files that changed from the base of the PR and between 0f896e4 and ab8a3fa.

📒 Files selected for processing (6)
  • services/summary_cache.py
  • services/workspace_context.py
  • services/workspace_listing.py
  • tests/test_summary_cache.py
  • tests/test_workspace_context.py
  • tests/test_workspace_listing_performance.py
🚧 Files skipped from review as they are similar to previous changes (4)
  • tests/test_summary_cache.py
  • services/summary_cache.py
  • tests/test_workspace_context.py
  • services/workspace_listing.py

@bradjin8 bradjin8 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

services/export_engine.py line 200:

Still calls infer_invalid_workspace_aliases inline on every export. Issue #116 scope lists export_engine as a call site; PR defers search but not export. Large exports with invalid workspaces will keep paying the full composer scan — consider resolve_invalid_workspace_aliases_cached here in a follow-up.

services/search.py line 386:

Still uses inline infer_invalid_workspace_aliases in _load_search_workspace_assigner. PR marks search as out of scope; document in issue/PR that search deep-links may still cold-scan until a follow-up.

Comment thread services/workspace_tabs.py Outdated
Comment thread services/workspace_tabs.py Outdated
Comment thread services/workspace_context.py
Comment thread tests/test_workspace_listing_performance.py
@@ -453,18 +455,14 @@ def _build_workspace_tab_summaries_uncached(

composer_rows = safe_fetchall(global_db, COMPOSER_ROWS_WITH_HEADERS_SQL)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list_workspace_tab_summaries still fetches all composerData:* rows for the summary loop (required). Alias inference is cached across requests, but the first summary load per fingerprint still pays one roster scan — expected; worth a one-line comment so future readers don't try to "optimize" it away incorrectly.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment before composer_rows = safe_fetchall(...) in _build_workspace_tab_summaries_uncached

Comment thread services/summary_cache.py
aliases = data.get("invalid_workspace_aliases")
if not isinstance(aliases, dict):
return None
validated: dict[str, str] = {}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Malformed cache entries reject the entire file (returns None). Correct and safe; consider logging at debug when validation fails so operators can distinguish corrupt cache from cold miss.

@clean6378-max-it

Copy link
Copy Markdown
Collaborator Author

services/export_engine.py line 200:

Still calls infer_invalid_workspace_aliases inline on every export. Issue #116 scope lists export_engine as a call site; PR defers search but not export. Large exports with invalid workspaces will keep paying the full composer scan — consider resolve_invalid_workspace_aliases_cached here in a follow-up.

services/search.py line 386:

Still uses inline infer_invalid_workspace_aliases in _load_search_workspace_assigner. PR marks search as out of scope; document in issue/PR that search deep-links may still cold-scan until a follow-up.

wired resolve_invalid_workspace_aliases_cached
Documented: code comment + issue doc (no code change; operator out-of-scope)
_logger.debug on reject + test

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
services/export_engine.py (1)

203-210: 🚀 Performance & Scalability | 🔵 Trivial | 🏗️ Heavy lift

Reuse the already loaded composer rows on cache misses.

Line 201 has already loaded ide_composer_rows, but resolve_invalid_workspace_aliases_cached fetches COMPOSER_ROWS_WITH_HEADERS_SQL again on cold cache/no-cache paths. Consider extending the resolver to accept preloaded composer_rows and pass ide_composer_rows here to avoid a second full global DB scan during export.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/export_engine.py` around lines 203 - 210, The export path is doing a
second global DB scan for composer rows on cold-cache/no-cache calls even though
ide_composer_rows is already loaded. Update
resolve_invalid_workspace_aliases_cached to accept preloaded composer_rows and
thread that through its internal alias-resolution logic, then pass
ide_composer_rows from the export_engine flow so the cached/no-cache path reuses
the existing data instead of rerunning COMPOSER_ROWS_WITH_HEADERS_SQL.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@services/export_engine.py`:
- Around line 203-210: The export path is doing a second global DB scan for
composer rows on cold-cache/no-cache calls even though ide_composer_rows is
already loaded. Update resolve_invalid_workspace_aliases_cached to accept
preloaded composer_rows and thread that through its internal alias-resolution
logic, then pass ide_composer_rows from the export_engine flow so the
cached/no-cache path reuses the existing data instead of rerunning
COMPOSER_ROWS_WITH_HEADERS_SQL.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8cb2639d-3e84-4ad1-b594-9303fddc8cba

📥 Commits

Reviewing files that changed from the base of the PR and between ab8a3fa and f2c7663.

📒 Files selected for processing (4)
  • services/export_engine.py
  • services/search.py
  • services/summary_cache.py
  • tests/test_summary_cache.py
✅ Files skipped from review due to trivial changes (1)
  • services/search.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • services/summary_cache.py
  • tests/test_summary_cache.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cppa-cursor-browser: Cache invalid-workspace alias inference on workspace context

2 participants