fix(extract): attribute C/C++ CALLS edges to the enclosing function#463
Conversation
|
Thanks @KerseyFabrications — the fix is correct and One small thing before merge: |
|
Thanks @KerseyFabrications — the fix is correct and the Could you extract a single shared |
Addresses DeusData#463 review: the declarator-chain name resolver was copied into helpers.c, extract_unified.c, and extract_defs.c, and CBM_DECLARATOR_DEPTH_LIMIT was #defined twice -- the same triplication drift that caused DeusData#438. - Add cbm_resolve_c_declarator_name_node() to helpers.{c,h} as the single source of truth, carrying is_c_terminal_name/resolve_qualified_name with it. - Route the defs, calls, and unified extractors through it. - Hoist CBM_DECLARATOR_DEPTH_LIMIT into helpers.h; extract_defs.c's DECLARATOR_DEPTH_LIMIT now derives from it. Canonicalizes on the original extract_defs.c logic (operator/destructor aware) so defs behavior is unchanged and calls/unified now agree with it. Test: full suite green except an unrelated ASan RSS-budget check; clang-format clean. Signed-off-by: Kris Kersey <kris@kerseyfabrications.com>
A CALLS edge whose caller is a C/C++/CUDA/GLSL function was sourced to the file's Module node instead of the calling Function. "Find callers of X" returned a file path, outbound trace_path returned empty, and (:Function)-[:CALLS]->(:Function) queries missed for these languages. Root cause: the enclosing-function resolvers read only tree-sitter's `name` field, but a `function_definition` node has none — the name lives in the declarator chain (pointer/function/parenthesized/array declarators). So func_node_name() (internal/cbm/helpers.c) and resolve_func_name_node() (internal/cbm/extract_unified.c) returned NULL, the enclosing scope fell back to the module QN, and the edge was attributed to the Module node. This is the C counterpart to DeusData#220, which fixed the definition-naming path but not the enclosing-call path. Fix: descend the declarator chain to the innermost name node (mirroring resolve_c_declarator_name in extract_defs.c, including qualified and operator names) when a function_definition lacks a `name` field. Adds the regression test c_caller_attribution asserting a C call's enclosing_func_qn is the function, not the module. Fixes DeusData#438 Signed-off-by: Kris Kersey <kris@kerseyfabrications.com>
Addresses DeusData#463 review: the declarator-chain name resolver was copied into helpers.c, extract_unified.c, and extract_defs.c, and CBM_DECLARATOR_DEPTH_LIMIT was #defined twice -- the same triplication drift that caused DeusData#438. - Add cbm_resolve_c_declarator_name_node() to helpers.{c,h} as the single source of truth, carrying is_c_terminal_name/resolve_qualified_name with it. - Route the defs, calls, and unified extractors through it. - Hoist CBM_DECLARATOR_DEPTH_LIMIT into helpers.h; extract_defs.c's DECLARATOR_DEPTH_LIMIT now derives from it. Canonicalizes on the original extract_defs.c logic (operator/destructor aware) so defs behavior is unchanged and calls/unified now agree with it. Test: full suite green except an unrelated ASan RSS-budget check; clang-format clean. Signed-off-by: Kris Kersey <kris@kerseyfabrications.com>
084cb42 to
adc8304
Compare
|
Done. Deduplicated in the latest push then rebased.
I canonicalized on the original Full test suite is green ( |
|
Tested this PR against a large C++ codebase (~40k symbols, heavy use of out-of-line method definitions). Clear net improvement, but one very common out-of-line form still falls back to the File node, so I wanted to flag it before merge with a minimal repro. Aggregate (same tree, fresh full index; before = base@53ebeb4, after = this PR):
So it roughly doubles symbol-level attribution — but the majority of CALLS are still file-anchored, and the remainder is concentrated in one definition form. The form still falling back to File: out-of-line methods defined inside Same logical method, three ways:
Controlled check that isolates it: two out-of-line classes in the same directory — one whose Minimal repro — the same definition written two ways: // foo.h
namespace mylib { class Foo { public: void Bar(); void Baz(); }; }// foo_global.cc — RESOLVES: Foo::Bar's call to Baz() is attributed to Foo::Bar
using namespace mylib;
void Foo::Bar() { Baz(); }// foo_block.cc — FILE-ANCHORS: identical body, but the call attributes to the File node, not Foo::Bar
namespace mylib { void Foo::Bar() { Baz(); } }It looks like the declarator walker folds the class qualifier ( |
|
Update: I went ahead and built this rather than just filing it, and tracing it end-to-end corrected two things in my earlier diagnosis:
Root cause is the #438 drift pattern again: the out-of-line class resolution lived in the defs extractor but had two divergent, incomplete copies on the call side: The fix consolidates instead of adding a fourth copy: promoted the class resolver to a shared helper plus a Validated with your controlled check as a test: indexed To keep this PR exactly the strict-improvement you approved, I'm not adding the out-of-line fix here — #463 stays as the declarator-walker dedup. The namespace-block fix is a separate follow-up PR stacked on this branch (I'll link it here once it's up); it rebases cleanly onto main the moment #463 merges. |
|
PR #621 has been submitted with the next fixes. In draft until this is merged. |
|
Really nice fix, @KerseyFabrications — and well diagnosed. C/C++ |
…ethod
A call inside a C++ out-of-line method definition (void Foo::Bar() { ... })
attributes to the File node instead of the enclosing method, including when the
definition is wrapped in a namespace block (reported on DeusData#463).
Root cause is not namespace context (absent from the C++ QN scheme) but a dropped
class qualifier: the call-side enclosing-QN computation produced t.path.Bar instead
of t.path.Foo.Bar for out-of-line definitions (no enclosing class AST node / no
class scope on the walk stack), so the pipeline's exact-QN source match fell back
to __file__. Same for the global (using-namespace) and namespace-block forms.
The out-of-line class resolution existed in the defs extractor but had two
divergent, incomplete copies on the call side -- the drift that caused DeusData#438.
Consolidate instead of adding a fourth copy:
- cbm_cpp_out_of_line_parent_class: promoted from extract_defs.c into helpers.
- cbm_cpp_out_of_line_method_qn: new shared helper that builds the class-scoped QN,
used by both compute_func_qn (CALLS, unified walk) and cbm_enclosing_func_qn
(cached path: USAGES/THROWS/CONFIGURES/type-assigns).
Fixes attribution for every edge type inside out-of-line methods, not just CALLS.
Tests: cpp_out_of_line_enclosing_qn (extraction QN match for global/block/nested)
and pipeline_cpp_out_of_line_call_attribution (full index: both forms source the
CALLS edge from a Method node). Full suite green apart from an unrelated, pre-existing
ASan RSS-budget flake.
Signed-off-by: Kris Kersey <kris@kerseyfabrications.com>
A CALLS edge whose caller is a C/C++/CUDA/GLSL function was sourced to the
file's Module node instead of the calling Function. "Find callers of X"
returned a file path, outbound trace_path returned empty, and
(:Function)-[:CALLS]->(:Function) queries missed for these languages.
Root cause: the enclosing-function resolvers read only tree-sitter's
namefield, but a
function_definitionnode has none — the name lives in thedeclarator chain (pointer/function/parenthesized/array declarators). So
func_node_name() (internal/cbm/helpers.c) and resolve_func_name_node()
(internal/cbm/extract_unified.c) returned NULL, the enclosing scope fell
back to the module QN, and the edge was attributed to the Module node. This
is the C counterpart to #220, which fixed the definition-naming path but not
the enclosing-call path.
Fix: descend the declarator chain to the innermost name node (mirroring
resolve_c_declarator_name in extract_defs.c, including qualified and operator
names) when a function_definition lacks a
namefield. Adds the regressiontest c_caller_attribution asserting a C call's enclosing_func_qn is the
function, not the module.
Fixes #438
Signed-off-by: Kris Kersey kris@kerseyfabrications.com