feat(search): index prose content for BM25 full-text search#617
Open
ShauryaaSharma wants to merge 1 commit into
Open
feat(search): index prose content for BM25 full-text search#617ShauryaaSharma wants to merge 1 commit into
ShauryaaSharma wants to merge 1 commit into
Conversation
This was referenced Jun 24, 2026
7da9b6f to
f6b313a
Compare
Section nodes (markdown) and Module nodes (YAML/JSON) previously exposed only their heading/name to BM25, so search_graph could not match the prose body or a config description. Index that text so content is searchable. - store: add a `body` column to the nodes_fts FTS5 table; new cbm_store_fts_rebuild() drops+recreates the table (upgrading legacy 4-column databases) and backfills `body` from each node's docstring, guarded by json_valid() against malformed-JSON rows - pipeline: both FTS backfill sites now call cbm_store_fts_rebuild() - mcp: stop excluding Section/Module from BM25 results (they rank below code symbols, so existing result ordering is preserved) - internal/cbm: capture the markdown section body beneath each heading (DeusData#518) and promote top-level description/summary/purpose values onto the file's Module node (DeusData#519), reusing the existing docstring property - tests: 7 extraction cases + 3 store FTS cases Closes DeusData#518 Closes DeusData#519 Signed-off-by: ShauryaaSharma <shauryasofficial27@gmail.com>
f6b313a to
58cd6c4
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What & why
search_graphBM25 only matched node names and headings, so it was blind to theprose that documentation- and config-heavy repos carry. Markdown
Sectionnodesexposed only their heading; YAML/JSON
Modulenodes only their file name — thesection body and the description value were never indexed, and
Section/Modulewere excluded from BM25 results entirely. This indexes that prose so content is
searchable.
Closes #518
Closes #519
Changes
bodycolumn to the nodes_fts FTS5 table; newcbm_store_fts_rebuild() drops+recreates it (upgrading legacy 4-column DBs) and
backfills
bodyfrom each node's docstring, guarded by json_valid().results still sort first).
description/summary/purpose value onto the Module node (META.yaml/frontmatter description values not indexed for BM25 search #519), reusing the
existing docstring property.
Testing
7 extraction cases + 3 store FTS cases added. Verified end-to-end: bodies are
extracted → indexed into nodes_fts.body → returned by BM25; json_valid() tolerates
malformed rows; legacy FTS tables upgrade on rebuild.
Notes
Backward compatible (additive column; legacy DBs upgrade on next index). No MCP
tool changes, no new deps, no new system()/popen()/network calls. #518 and #519
share the FTS
bodyinfra (#519 can't work without it), so they're together —happy to split if preferred.