Skip to content

Atlas foundation: codebase-knowledge layer in Pathfinder (off-by-default)#94

Open
jpr5 wants to merge 13 commits into
mainfrom
blitz/atlas-foundation/integration
Open

Atlas foundation: codebase-knowledge layer in Pathfinder (off-by-default)#94
jpr5 wants to merge 13 commits into
mainfrom
blitz/atlas-foundation/integration

Conversation

@jpr5
Copy link
Copy Markdown
Contributor

@jpr5 jpr5 commented Jun 6, 2026

Summary

Adds the Atlas foundation to Pathfinder — an agent-maintained codebase-knowledge layer (the "codebase-memory" quadrant alongside auto-memory, handoffs, and episodic memory). This PR lands the foundation dormant / off-by-default: the schema, providers, gardener, ratification endpoints, webhook ingestion, analytics, and a thin atlas CLI are all present and tested, but no operational loop is scheduled and no behavior changes for existing Pathfinder users.

What's included

  • Additive DB migration: atlas_seed_entries (durable inputs — decisions/corrections/inbox/schema) and atlas_cache_pages (regenerable derived pages). Durability attaches to inputs, not the wiki.
  • AtlasDataProvider (src/db/atlas.ts) — seed + cache persistence.
  • Gardener (src/indexing/atlas-gardener.ts) — regenerates cache pages from seed; hardened error path (logs failures, guards bookkeeping).
  • Ratification endpoints (src/server.ts) — body-param routes for approving/rejecting seed entries (path-param variants intentionally dropped — see below).
  • GitHub webhook PR ingestion (src/webhooks/) — capture is webhook-driven server-side, not agent-driven.
  • Retrieval analytics (src/db/analytics.ts) — Atlas retrievals excluded from standard /analytics.
  • atlas-cli.ts — thin stateless MCP client so agents (esp. Codex, which struggles with MCP reconnect) get a first-class atlas search "<question>" access path without configuring an MCP server.

Scope notes

  • Off-by-default. Wired to the library boundary, not the running service. Merging this does not turn anything on.
  • Path-param ratification routes were dropped in favor of the body-param routes — the path variants introduced a double-URL-decode bug on slash-bearing keys (Express 5 already decodes wildcard segments) and were redundant. Body routes are the single supported path.

Deferred wiring follow-up (pilot prerequisite — NOT in this PR)

The operational loop is a deliberate follow-up, required before the pilot:

  • Gardener scheduler (periodic cache regeneration)
  • Retrieval-metric endpoint surfacing
  • service: session tagging on retrievals
  • seed_path wiring (seed lives in a private backoffice sidecar for the pilot, not in-repo)

Pilot repos: copilotkit/copilotkit + ag-ui-protocol/ag-ui.

Test plan

  • tsc --noEmit — 0 errors
  • prettier — clean
  • full vitest suite — 3673/3673 passing
  • build — succeeds
  • independent 7-agent CR + confirmation round converged to zero
  • Sandbox E2E (PGlite, reuse test fixtures, deterministic gardener stub) MUST run before merge — runbook prepared

jpr5 added 13 commits June 5, 2026 23:45
A transient failure in markAtlasCachePagesStaleForSources caused
executeJob to reject, so onReindexComplete never fired for a reindex
that actually succeeded — suppressing bash-instance refresh, llms.txt/
faq.txt cache clearing, and the reindex audit. Wrap the Atlas cache
invalidation call in its own try/catch that logs and continues, keeping
it before the callback but unable to suppress it.
The per-page catch in gardenAtlasCachePages persisted the generation
error to the DB but never logged it, leaving operators blind. Worse, the
recordAtlasCachePageGenerationError call was unguarded: if it threw (e.g.
"Atlas cache page not found" on a concurrently deleted/re-keyed row, or
any transient DB error), the rejection escaped the loop and aborted the
entire gardening pass, losing all prior progress and never returning a
summary.

Now the generation failure is logged via console.error, and the
bookkeeping call is wrapped in its own try/catch that logs and continues
so a single page's bookkeeping failure can't poison the batch. Adds a
red-green test covering the bookkeeping-throws case.
- parseSseMessages now skips empty/whitespace `data:` frames (keepalives)
  and wraps per-event JSON.parse so unparseable frames are skipped instead
  of crashing the search command with an opaque "Unexpected end of JSON
  input" error.
- DEFAULT_TOOL is now "atlas-search" to match the Atlas tool name in
  pathfinder.example.yaml, so `atlas search "x"` targets Atlas by default
  instead of the docs search tool.

Adds red-green tests: an empty `data:` SSE frame that previously crashed,
and a default-tool assertion pinned to "atlas-search".
…path keys

Finding 1 (HIGH): approveAtlasCandidate silently returned 200 without
queuing a reindex when no orchestrator was wired (Atlas sources but no
search/knowledge tools). Now log a loud, actionable error and surface
reindexQueued:boolean in the JSON response. The orchestrator-present 200
path is unchanged.

Finding 2: the path-param approve/reject routes used :canonicalKey, which
a literal "/" in a real key (e.g. "github-pr:atlas:owner/repo:42") would
truncate, addressing the wrong key. Switch to an Express 5 wildcard param
(*canonicalKey) and reconstruct/decode the full key in atlasCanonicalKey,
so both %2F-escaped and literal-slash keys round-trip. Body-based routes
are untouched.

Tests: add red-green coverage for both findings in
atlas-ratification-endpoints.test.ts.
The path-param wildcard routes (POST /api/atlas/candidates/*canonicalKey/
approve and /reject) double-decoded the key (Express 5 decodes wildcard
segments, then decodeURIComponent ran again, corrupting %XX keys), were
body/path-inconsistent, and were fully redundant with the working
body-based routes. Drop both registrations and the now-unused
atlasCanonicalKey(req) helper. Keep the body routes, atlasCanonicalKeyFromBody,
and the approve-without-orchestrator fix. Convert the surviving tests to the
body route and remove the path-param-only slash-key test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant