Skip to content

fix(extract): attribute C/C++ CALLS edges to the enclosing function#463

Merged
DeusData merged 2 commits into
DeusData:mainfrom
KerseyFabrications:fix/c-calls-function-source
Jun 25, 2026
Merged

fix(extract): attribute C/C++ CALLS edges to the enclosing function#463
DeusData merged 2 commits into
DeusData:mainfrom
KerseyFabrications:fix/c-calls-function-source

Conversation

@KerseyFabrications

Copy link
Copy Markdown
Contributor

A CALLS edge whose caller is a C/C++/CUDA/GLSL function was sourced to the
file's Module node instead of the calling Function. "Find callers of X"
returned a file path, outbound trace_path returned empty, and
(:Function)-[:CALLS]->(:Function) queries missed for these languages.

Root cause: the enclosing-function resolvers read only tree-sitter's name
field, but a function_definition node has none — the name lives in the
declarator chain (pointer/function/parenthesized/array declarators). So
func_node_name() (internal/cbm/helpers.c) and resolve_func_name_node()
(internal/cbm/extract_unified.c) returned NULL, the enclosing scope fell
back to the module QN, and the edge was attributed to the Module node. This
is the C counterpart to #220, which fixed the definition-naming path but not
the enclosing-call path.

Fix: descend the declarator chain to the innermost name node (mirroring
resolve_c_declarator_name in extract_defs.c, including qualified and operator
names) when a function_definition lacks a name field. Adds the regression
test c_caller_attribution asserting a C call's enclosing_func_qn is the
function, not the module.

Fixes #438

Signed-off-by: Kris Kersey kris@kerseyfabrications.com

@DeusData

Copy link
Copy Markdown
Owner

Thanks @KerseyFabrications — the fix is correct and c_caller_attribution is exactly the right reproduce-first guard.

One small thing before merge: CBM_DECLARATOR_DEPTH_LIMIT is now #defined independently in both extract_unified.c and helpers.c (identical values today, but they have to stay in sync with the existing definition in extract_defs.c). Could you hoist it into a shared header (or reuse the existing one) so there's a single source of truth? Once that's deduplicated, this is good to go. 🙏

@DeusData

Copy link
Copy Markdown
Owner

Thanks @KerseyFabrications — the fix is correct and the c_caller_attribution test is solid, so the behavior is good. The one open item from the earlier review is still here: the declarator-name walker is now duplicated in three places (helpers.c, extract_unified.c, and the existing extract_defs.c), and CBM_DECLARATOR_DEPTH_LIMIT is #defined twice.

Could you extract a single shared cbm_resolve_c_declarator_name_node() (in helpers.c, declared in a shared header) and route all three sites + the one depth-limit constant through it? That triplication is exactly the kind of drift that caused #438 in the first place. Once it's deduped, this is good to merge. 🙏

KerseyFabrications added a commit to KerseyFabrications/codebase-memory-mcp that referenced this pull request Jun 24, 2026
Addresses DeusData#463 review: the declarator-chain name resolver was copied into
helpers.c, extract_unified.c, and extract_defs.c, and CBM_DECLARATOR_DEPTH_LIMIT
was #defined twice -- the same triplication drift that caused DeusData#438.

- Add cbm_resolve_c_declarator_name_node() to helpers.{c,h} as the single
  source of truth, carrying is_c_terminal_name/resolve_qualified_name with it.
- Route the defs, calls, and unified extractors through it.
- Hoist CBM_DECLARATOR_DEPTH_LIMIT into helpers.h; extract_defs.c's
  DECLARATOR_DEPTH_LIMIT now derives from it.

Canonicalizes on the original extract_defs.c logic (operator/destructor aware)
so defs behavior is unchanged and calls/unified now agree with it.

Test: full suite green except an unrelated ASan RSS-budget check; clang-format clean.

Signed-off-by: Kris Kersey <kris@kerseyfabrications.com>
A CALLS edge whose caller is a C/C++/CUDA/GLSL function was sourced to the
file's Module node instead of the calling Function. "Find callers of X"
returned a file path, outbound trace_path returned empty, and
(:Function)-[:CALLS]->(:Function) queries missed for these languages.

Root cause: the enclosing-function resolvers read only tree-sitter's `name`
field, but a `function_definition` node has none — the name lives in the
declarator chain (pointer/function/parenthesized/array declarators). So
func_node_name() (internal/cbm/helpers.c) and resolve_func_name_node()
(internal/cbm/extract_unified.c) returned NULL, the enclosing scope fell
back to the module QN, and the edge was attributed to the Module node. This
is the C counterpart to DeusData#220, which fixed the definition-naming path but not
the enclosing-call path.

Fix: descend the declarator chain to the innermost name node (mirroring
resolve_c_declarator_name in extract_defs.c, including qualified and operator
names) when a function_definition lacks a `name` field. Adds the regression
test c_caller_attribution asserting a C call's enclosing_func_qn is the
function, not the module.

Fixes DeusData#438

Signed-off-by: Kris Kersey <kris@kerseyfabrications.com>
Addresses DeusData#463 review: the declarator-chain name resolver was copied into
helpers.c, extract_unified.c, and extract_defs.c, and CBM_DECLARATOR_DEPTH_LIMIT
was #defined twice -- the same triplication drift that caused DeusData#438.

- Add cbm_resolve_c_declarator_name_node() to helpers.{c,h} as the single
  source of truth, carrying is_c_terminal_name/resolve_qualified_name with it.
- Route the defs, calls, and unified extractors through it.
- Hoist CBM_DECLARATOR_DEPTH_LIMIT into helpers.h; extract_defs.c's
  DECLARATOR_DEPTH_LIMIT now derives from it.

Canonicalizes on the original extract_defs.c logic (operator/destructor aware)
so defs behavior is unchanged and calls/unified now agree with it.

Test: full suite green except an unrelated ASan RSS-budget check; clang-format clean.

Signed-off-by: Kris Kersey <kris@kerseyfabrications.com>
@KerseyFabrications KerseyFabrications force-pushed the fix/c-calls-function-source branch from 084cb42 to adc8304 Compare June 24, 2026 16:42
@KerseyFabrications

Copy link
Copy Markdown
Contributor Author

Done. Deduplicated in the latest push then rebased.

  • Extracted a single cbm_resolve_c_declarator_name_node() into helpers.c (declared in helpers.h), carrying its is_c_terminal_name() / resolve_qualified_name() helpers with it.
  • Routed all three sites through it: the defs (extract_defs.c), calls (helpers.c), and unified (extract_unified.c) extractors. The two private copies are gone.
  • Hoisted CBM_DECLARATOR_DEPTH_LIMIT into helpers.h as the single source of truth; extract_defs.c's DECLARATOR_DEPTH_LIMIT enum now derives from it (it's still referenced by cpp_out_of_line_parent_class), so there's exactly one definition.

I canonicalized on the original extract_defs.c logic (operator/destructor-aware via resolve_qualified_name) rather than the inlined copies, so defs-extraction behavior is unchanged and the calls/unified paths now agree with it, closing the triplication that drifted into #438.

Full test suite is green (c_caller_attribution included), clang-format clean.

@mattall

mattall commented Jun 24, 2026

Copy link
Copy Markdown

Tested this PR against a large C++ codebase (~40k symbols, heavy use of out-of-line method definitions). Clear net improvement, but one very common out-of-line form still falls back to the File node, so I wanted to flag it before merge with a minimal repro.

Aggregate (same tree, fresh full index; before = base@53ebeb4, after = this PR):

  • CALLS attributed to a function/method/class symbol: 5,927 → 11,666 (17.5% → 31.5% of all CALLS edges)
  • CALLS still sourced from the file node (File/Module): 82.5% → 68.4%
  • (total CALLS 33,928 → 36,996)

So it roughly doubles symbol-level attribution — but the majority of CALLS are still file-anchored, and the remainder is concentrated in one definition form.

The form still falling back to File: out-of-line methods defined inside namespace { } blocks.

Same logical method, three ways:

Definition form Attributed to
In-class (inline in header or .cc) ✅ the Method
Out-of-line at global scope (using namespace X;) ✅ the Method
Out-of-line inside a namespace { } block ❌ the File node

Controlled check that isolates it: two out-of-line classes in the same directory — one whose .cc opens with using namespace X; at file scope, the other whose defs are wrapped in namespace X { namespace Y { ... } }. The first resolves every call to its methods; the second file-anchors every call. The only difference is the enclosing namespace-block context.

Minimal repro — the same definition written two ways:

// foo.h
namespace mylib { class Foo { public: void Bar(); void Baz(); }; }
// foo_global.cc  — RESOLVES: Foo::Bar's call to Baz() is attributed to Foo::Bar
using namespace mylib;
void Foo::Bar() { Baz(); }
// foo_block.cc  — FILE-ANCHORS: identical body, but the call attributes to the File node, not Foo::Bar
namespace mylib { void Foo::Bar() { Baz(); } }

It looks like the declarator walker folds the class qualifier (Foo::) but not the surrounding open-namespace context into enclosing_func_qn, so the nested-namespace definition's computed QN is missing the mylib:: prefix, mismatches the node, and re-triggers the calls_find_source()__file__ fallback from #554.

@KerseyFabrications

Copy link
Copy Markdown
Contributor Author

Update: I went ahead and built this rather than just filing it, and tracing it end-to-end corrected two things in my earlier diagnosis:

  1. Namespace isn't actually in the QN. I'd assumed the namespace {} block left a mylib:: prefix on one side of the comparison. It doesn't, namespace context never enters the C++ QN scheme at all (namespace_name is null for these, and mylib appears in no QN). The real defect is narrower and hits both forms equally: the call-side enclosing-QN computation drops the class qualifier for out-of-line definitions, producing project.path.Bar instead of project.path.Foo.Bar.
  2. Resolution is exact-QN match, not a short-name resolver. calls_find_source() does a direct find_by_qn(enclosing_qn) and falls straight back to __file__ on a miss, there's no short-name rescue on the source side. So the call never matches the Method node and file-anchors.

Root cause is the #438 drift pattern again: the out-of-line class resolution lived in the defs extractor but had two divergent, incomplete copies on the call side: compute_func_qn (unified walk → CALLS) and cbm_enclosing_func_qn (cached path → USAGES/THROWS/CONFIGURES/type-assigns). Neither reconstructed the class for out-of-line defs.

The fix consolidates instead of adding a fourth copy: promoted the class resolver to a shared helper plus a cbm_cpp_out_of_line_method_qn() that both call-side sites use, so the call-side QN now matches the defs-node QN. Side benefit: attribution is now correct for every edge type inside out-of-line methods, not just CALLS.

Validated with your controlled check as a test: indexed foo.h + a using namespace .cc + a namespace {} .cc, and both Baz() calls now source from the Method node rather than the file, plus an extraction-level unit test for the global/block/nested forms. Full suite green.

To keep this PR exactly the strict-improvement you approved, I'm not adding the out-of-line fix here — #463 stays as the declarator-walker dedup. The namespace-block fix is a separate follow-up PR stacked on this branch (I'll link it here once it's up); it rebases cleanly onto main the moment #463 merges.

@KerseyFabrications

Copy link
Copy Markdown
Contributor Author

PR #621 has been submitted with the next fixes. In draft until this is merged.

@DeusData

Copy link
Copy Markdown
Owner

Really nice fix, @KerseyFabrications — and well diagnosed. C/C++ function_definition nodes carry the name down in the declarator chain rather than a name field, so the enclosing-function resolvers fell back to the file Module node and CALLS edges were misattributed. Walking the declarator chain in both cbm_enclosing_func_qn and resolve_func_name_node mirrors the existing resolve_c_declarator_name pattern from #220 cleanly, and the c_caller_attribution test is a solid guard. Thanks for closing #438 — merging.

@DeusData DeusData merged commit 1c3c557 into DeusData:main Jun 25, 2026
13 checks passed
KerseyFabrications added a commit to KerseyFabrications/codebase-memory-mcp that referenced this pull request Jun 26, 2026
…ethod

A call inside a C++ out-of-line method definition (void Foo::Bar() { ... })
attributes to the File node instead of the enclosing method, including when the
definition is wrapped in a namespace block (reported on DeusData#463).

Root cause is not namespace context (absent from the C++ QN scheme) but a dropped
class qualifier: the call-side enclosing-QN computation produced t.path.Bar instead
of t.path.Foo.Bar for out-of-line definitions (no enclosing class AST node / no
class scope on the walk stack), so the pipeline's exact-QN source match fell back
to __file__. Same for the global (using-namespace) and namespace-block forms.

The out-of-line class resolution existed in the defs extractor but had two
divergent, incomplete copies on the call side -- the drift that caused DeusData#438.
Consolidate instead of adding a fourth copy:
- cbm_cpp_out_of_line_parent_class: promoted from extract_defs.c into helpers.
- cbm_cpp_out_of_line_method_qn: new shared helper that builds the class-scoped QN,
  used by both compute_func_qn (CALLS, unified walk) and cbm_enclosing_func_qn
  (cached path: USAGES/THROWS/CONFIGURES/type-assigns).

Fixes attribution for every edge type inside out-of-line methods, not just CALLS.

Tests: cpp_out_of_line_enclosing_qn (extraction QN match for global/block/nested)
and pipeline_cpp_out_of_line_call_attribution (full index: both forms source the
CALLS edge from a Method node). Full suite green apart from an unrelated, pre-existing
ASan RSS-budget flake.

Signed-off-by: Kris Kersey <kris@kerseyfabrications.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cross-file CALLS edges resolve to Module node instead of caller Function node (v0.7.0)

3 participants