You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add MinHash fingerprinting and SIMILAR_TO edges for near-clone detection
Compute K=64 MinHash signatures from normalized AST node-type trigrams
during function extraction, then generate SIMILAR_TO edges via LSH
(b=32, r=2) for function pairs with Jaccard >= 0.95.
- src/simhash/minhash.{h,c}: MinHash compute, Jaccard, hex encode/decode,
LSH index with band hashing for O(n) candidate generation
- src/pipeline/pass_similarity.c: post-pass reads fingerprints from node
properties, builds LSH index, emits SIMILAR_TO edges with jaccard and
same_file metadata. Same-language only, max 10 edges per node.
- internal/cbm/cbm.h: fingerprint fields on CBMDefinition
- internal/cbm/extract_defs.c: compute_fingerprint() hook at 3 extraction
sites after complexity, skip functions with < 10 AST body nodes
- pass_definitions.c + pass_parallel.c: serialize fingerprint to "fp" hex
in properties_json for both sequential and parallel pipeline paths
- pipeline.c + pipeline_incremental.c: register pass_similarity in both
full and incremental post-pass lists
- tests/test_simhash.c: 28 tests across 4 suites (core, LSH, edge gen,
pipeline integration with generated Go project + incremental)
Copy file name to clipboardExpand all lines: README.md
+13Lines changed: 13 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -339,6 +339,19 @@ codebase-memory-mcp config set auto_index_limit 50000 # max files for auto-in
339
339
codebase-memory-mcp config reset auto_index # reset to default
340
340
```
341
341
342
+
### Environment Variables
343
+
344
+
| Variable | Default | Description |
345
+
|----------|---------|-------------|
346
+
|`CBM_CACHE_DIR`|`~/.cache/codebase-memory-mcp`| Override the database storage directory. All project indexes and config are stored here. |
347
+
|`CBM_DIAGNOSTICS`|`false`| Set to `1` or `true` to enable periodic diagnostics output to `/tmp/cbm-diagnostics-<pid>.json`. |
348
+
|`CBM_DOWNLOAD_URL`|*(GitHub releases)*| Override the download URL for updates. Used for testing or self-hosted deployments. |
349
+
350
+
```bash
351
+
# Store indexes in a custom directory
352
+
export CBM_CACHE_DIR=~/my-projects/cbm-data
353
+
```
354
+
342
355
## Custom File Extensions
343
356
344
357
Map additional file extensions to supported languages via JSON config files. Useful for framework-specific extensions like `.blade.php` (Laravel) or `.mjs` (ES modules).
0 commit comments