Skip to content

implement ingest_traces — create CROSS_* edges from runtime traces (currently a no-op stub) #612

Description

@Relaxe111

What problem does this solve?

ingest_traces accepts a payload and reports success, but creates no nodes or edges:

// ingest_traces({ project, traces: [ ... ] })
{ "status": "accepted", "traces_received": 1, "note": "Runtime edge creation from traces not yet implemented" }

This is the missing half of cross-repo intelligence. In a polyrepo of independently-indexed projects, services connect through channels that carry no statically-matchable literal route at the call site, so the static graph can't see them:

Channel Why static analysis can't see it
GraphQL over a single dynamic endpoint All operations POST to one URL resolved from config/env (import.meta.env.VITE_*). The URL is opaque to the parser; the operation is in the request body, not the path.
Micro-frontends Components are mounted at runtime by string key — no import/call edge.
Events / pub-sub / logical replication No code edge crosses the boundary.

Concrete result today: index_repository(mode="cross-repo-intelligence", target_projects=["*"]) across 67 projects returns cross_http_calls: 0, cross_graphql_calls: 0, … — even though the services clearly call each other. Backends do expose Route nodes (one service has ~131), but no caller references them with a matchable URL. Example: chat-app's getJournalUserId.query.ts calls the directory service's customer_user query over GraphQL, and that relation appears in no edge.

Runtime traces are the only ground truth for these relations, and ingest_traces is the right entry point — it just isn't implemented.

Environment: ~67 projects (polyrepo, one graph per repo); frontends call backends via typed GraphQL clients (gql.tada) over Hasura; services also communicate via events / logical replication.

Related: #78 touches the same tool but is about transport/schema acceptance, not edge creation.

Proposed solution

Implement the edge-creation pass so an ingested trace becomes a CROSS_* edge. For each trace:

  1. Resolve the caller to a node in the caller's project — by call-site (file_path + line, or qualified_name like <project>.<path>.<symbol>), falling back to the caller service/project.
  2. Resolve the callee — match method + url/route to a Route/Resource node in a target project; for GraphQL, match the operation / root field to the owning schema; create a lightweight external Service/Endpoint node if no target is indexed.
  3. Create / merge the edgeCROSS_HTTP_CALLS (or CROSS_GRAPHQL_CALLS / CROSS_ASYNC_CALLS / CROSS_CHANNEL per protocol) with confidence, via: "trace", method, url_path, operation, count, first_seen, last_seen.
  4. Idempotent — re-ingest merges/increments count and updates last_seen instead of duplicating.
Proposed trace schema (please document the accepted shape — this or native OTLP spans)
{
  "project": "<caller project name from list_projects>",
  "traces": [
    {
      "caller": {
        "project": "…-chat-asma-app-chat",
        "file_path": "src/api/graphql/directory/queries/getJournalUserId.query.ts",
        "symbol": "getJournalUserId"
      },
      "callee": {
        "service": "directory",
        "url": "/api/directory/v1/graphql",
        "operation": "getDirectorySelectedPatientData",
        "root_fields": ["customer_user"]
      },
      "protocol": "graphql",
      "method": "POST",
      "status": 200,
      "trace_id": "", "span_id": "", "parent_span_id": "",
      "timestamp": "2026-06-24T10:00:00Z"
    }
  ]
}

trace_id / span_id / parent_span_id let multi-hop chains be reconstructed; operation / root_fields enable operation-level GraphQL edges, not one coarse endpoint edge.

Expected result:

MATCH (a)-[r:CROSS_GRAPHQL_CALLS]->(b)
RETURN a.name, b.name, r.operation, r.confidence, r.via

→ a merged edge from the caller node to the directory schema / customer_user node, via: "trace", surfaced in query_graph, trace_path, and get_architecture.

Acceptance criteria:

  • Documented traces item schema (and/or accepted OTLP / OTel JSON).
  • A valid trace creates the corresponding CROSS_* edge with via: "trace" + confidence.
  • Re-ingest is idempotent (merge + count / last_seen), no duplicate edges.
  • GraphQL traces produce operation-level edges (operation / root field), not only one endpoint edge.
  • New edges appear in query_graph, trace_path, and get_architecture.
  • Response reports edges_created / edges_merged / unresolved instead of the "not yet implemented" note.

Alternatives considered

Approach Result
cross-repo-intelligence static matching ~0 edges (no literal HTTP routes at call sites; GraphQL dynamic URLs)
Co-indexing repos into one project No reliable cross-layer edges — only low-confidence suffix_match noise; the relation is runtime/string-keyed
Monorepo-root single index Infeasible — submodule traversal duplicates nodes
Hand-authored relation maps Works, but lives outside the graph; the whole value of ingest_traces is putting the real edges in the graph

Confirmations

  • I searched existing issues and this is not a duplicate.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions