Skip to content

Commit 2bfdf80

Browse files
devwhodevsclaude
andcommitted
chore: bump to v0.7.0 — warm sync
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 14bb4f7 commit 2bfdf80

3 files changed

Lines changed: 21 additions & 15 deletions

File tree

CLAUDE.md

Lines changed: 19 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -4,29 +4,30 @@ Local hybrid search CLI for Obsidian vaults. Rust, MIT licensed.
44

55
## Architecture
66

7-
Single binary with 19 modules behind a lib crate:
7+
Single binary with 20 modules behind a lib crate:
88

99
- `config.rs` — loads `~/.engraph/config.toml` and `vault.toml`, merges CLI args, provides `data_dir()`
1010
- `chunker.rs` — smart chunking with break-point scoring algorithm. Finds optimal split points considering headings, code fences, blank lines, and thematic breaks. `split_oversized_chunks()` handles token-aware secondary splitting with overlap
1111
- `docid.rs` — deterministic 6-char hex IDs for files (SHA-256 of path, truncated). Shown in search results for quick reference
12-
- `embedder.rs` — downloads and runs `all-MiniLM-L6-v2` ONNX model (384-dim). SHA256-verified on download. Uses `ort` for inference, `tokenizers` for tokenization. Implements `ModelBackend` trait
12+
- `embedder.rs` — downloads and runs `all-MiniLM-L6-v2` ONNX model (384-dim). SHA256-verified on download. Uses `ort` for inference, `tokenizers` for tokenization. Implements `ModelBackend` trait. **Not `Send`** — all embedding is serial
1313
- `model.rs` — pluggable `ModelBackend` trait, model registry, and `parse_model_spec()`. Enables future model swapping without changing consumer code
1414
- `fts.rs` — FTS5 full-text search support. Re-exports `FtsResult` from store. BM25-ranked keyword search
1515
- `fusion.rs` — Reciprocal Rank Fusion (RRF) engine. Merges semantic + FTS5 + graph results. Supports lane weighting, `--explain` output with per-lane detail
16-
- `context.rs` — context engine. Six functions: `read` (full note content + metadata), `list` (filtered note listing), `vault_map` (structure overview), `who` (person context bundle), `project` (project context bundle), `context_topic` (rich topic context with budget trimming). Pure functions taking `ContextParams` — no model loading except `context_topic` which reuses `search_internal`
16+
- `context.rs` — context engine. Six functions: `read` (full note content + metadata), `list` (filtered note listing with `created_by` filter), `vault_map` (structure overview), `who` (person context bundle), `project` (project context bundle), `context_topic` (rich topic context with budget trimming). Pure functions taking `ContextParams` — no model loading except `context_topic` which reuses `search_internal`
1717
- `vecstore.rs` — sqlite-vec virtual table integration. Manages the `vec_chunks` vec0 table for vector storage and KNN search. Handles insert, delete, and search operations against the virtual table
1818
- `tags.rs` — tag registry module. Maintains a `tag_registry` table tracking known tags with source attribution. Supports fuzzy matching for tag suggestions during note creation
19-
- `links.rs` — link discovery module. Scans note content for potential wikilink targets using fuzzy basename matching and heading detection. Suggests links that could be added to improve vault connectivity
20-
- `placement.rs` — folder placement engine. Uses folder centroids (average embeddings per folder) to suggest the best folder for new notes. Falls back to inbox when confidence is low
21-
- `writer.rs` — write pipeline orchestrator. 5-step pipeline: resolve tags (fuzzy match + register new), discover links, place in folder, atomic file write (temp + rename), and index update. Supports create, append, update_metadata, and move_note operations with mtime-based conflict detection and crash recovery via temp file cleanup
22-
- `serve.rs` — MCP stdio server via rmcp SDK. Exposes 11 tools: 7 read (search, read, list, vault_map, who, project, context) + 4 write (create, append, update_metadata, move_note). EngraphServer struct with Arc+Mutex wrapping for async handlers. Loads all resources at startup
19+
- `links.rs` — link discovery module. Three match types: exact basename, fuzzy (sliding window Levenshtein, 0.92 threshold), and first-name (People folder, suggestion-only at 650bp). Overlap resolution via type priority (exact > alias > fuzzy > first-name)
20+
- `placement.rs` — folder placement engine. Uses folder centroids (online mean of embeddings per folder) to suggest the best folder for new notes. Falls back to inbox when confidence is low. Includes placement correction detection (`detect_correction_from_frontmatter`) and frontmatter stripping for moved files
21+
- `writer.rs` — write pipeline orchestrator. 5-step pipeline: resolve tags (fuzzy match + register new), discover links (exact + fuzzy), place in folder, atomic file write (temp + rename), and index update. Supports create, append, update_metadata, move_note, archive, and unarchive operations with mtime-based conflict detection and crash recovery via temp file cleanup
22+
- `watcher.rs` — file watcher for `engraph serve`. OS thread producer (notify-debouncer-full, 2s debounce) sends `Vec<WatchEvent>` over tokio::mpsc to async consumer task. Two-pass batch processing: mutations (index_file/remove_file/rename_file) then edge rebuild. Move detection via content hash matching. Placement correction on file moves. Centroid adjustment on file add/remove. Startup reconciliation via `run_index_shared`
23+
- `serve.rs` — MCP stdio server via rmcp SDK. Exposes 13 tools: 7 read (search, read, list, vault_map, who, project, context) + 6 write (create, append, update_metadata, move_note, archive, unarchive). EngraphServer struct with Arc+Mutex wrapping for async handlers. Spawns file watcher on startup
2324
- `graph.rs` — vault graph agent. Extracts wikilink targets, expands search results by following graph connections 1-2 hops. Relevance filtering via FTS5 term check and shared tags
2425
- `profile.rs` — vault profile detection. Auto-detects PARA/Folders/Flat structure, vault type (Obsidian/Logseq/Plain), wikilinks, frontmatter, tags. Writes/loads `vault.toml`
25-
- `store.rs` — SQLite persistence. Tables: `meta`, `files` (with docid), `chunks` (with vector BLOBs), `chunks_fts` (FTS5), `edges` (vault graph), `tombstones`, `tag_registry`, `folder_centroids`. `vec_chunks` virtual table (sqlite-vec) for KNN search. Handles incremental diffing via content hashes
26-
- `indexer.rs` — orchestrates vault walking (via `ignore` crate for `.gitignore` support), diffing, chunking, embedding (Rayon for parallel chunking, serial embedding since `Embedder` is not `Send`), serial writes to store + sqlite-vec + FTS5, vault graph edge building (wikilinks + people detection), and folder centroid computation
26+
- `store.rs` — SQLite persistence. Tables: `meta`, `files` (with docid, created_by), `chunks` (with vector BLOBs), `chunks_fts` (FTS5), `edges` (vault graph), `tombstones`, `tag_registry`, `folder_centroids`, `placement_corrections`, `link_skiplist` (reserved). `vec_chunks` virtual table (sqlite-vec) for KNN search. Handles incremental diffing via content hashes
27+
- `indexer.rs` — orchestrates vault walking (via `ignore` crate for `.gitignore` support), diffing, chunking, embedding, writes to store + sqlite-vec + FTS5, vault graph edge building (wikilinks + people detection), and folder centroid computation. Exposes `index_file`, `remove_file`, `rename_file` as public per-file functions. `run_index_shared` accepts external store/embedder for watcher FullRescan
2728
- `search.rs` — hybrid search orchestrator. Runs semantic (sqlite-vec KNN), keyword (FTS5 BM25), and graph expansion lanes, then fuses via RRF
2829

29-
`main.rs` is a thin clap CLI (async via `#[tokio::main]`). Subcommands: `index`, `search` (with `--explain`), `status`, `clear`, `init`, `configure`, `models`, `graph` (show/stats), `context` (read/list/vault-map/who/project/topic), `write` (create/append/update-metadata/move), `serve` (MCP stdio server).
30+
`main.rs` is a thin clap CLI (async via `#[tokio::main]`). Subcommands: `index`, `search` (with `--explain`), `status`, `clear`, `init`, `configure`, `models`, `graph` (show/stats), `context` (read/list/vault-map/who/project/topic), `write` (create/append/update-metadata/move), `serve` (MCP stdio server with file watcher).
3031

3132
## Key patterns
3233

@@ -36,7 +37,10 @@ Single binary with 19 modules behind a lib crate:
3637
- **Smart chunking:** Break-point scoring algorithm assigns scores to potential split points (headings 50-100, code fences 80, thematic breaks 60, blank lines 20). Code fence protection prevents splitting inside code blocks
3738
- **Incremental indexing:** `diff_vault()` compares file content hashes in SQLite against disk. Changed files have their old chunks, vectors, and edges deleted, then are re-processed. FTS5 and sqlite-vec entries cleaned up alongside store entries
3839
- **sqlite-vec for vector search:** Vectors stored in a `vec_chunks` virtual table (vec0). KNN search via `vec_distance_cosine()`. Real deletes — no tombstone filtering needed during search
39-
- **Write pipeline:** 5-step process for creating/modifying notes: (1) resolve tags via fuzzy matching against tag registry, (2) discover potential wikilinks via basename matching, (3) suggest folder placement via centroid similarity, (4) atomic file write (temp + rename for crash safety), (5) immediate index update (embed + insert into sqlite-vec + FTS5 + edges)
40+
- **Write pipeline:** 5-step process for creating/modifying notes: (1) resolve tags via fuzzy matching against tag registry, (2) discover potential wikilinks via exact + fuzzy matching, (3) suggest folder placement via centroid similarity, (4) atomic file write (temp + rename for crash safety), (5) immediate index update (embed + insert into sqlite-vec + FTS5 + edges)
41+
- **Warm sync (file watcher):** OS thread watches vault via `notify-debouncer-full` (2s debounce). Events sent over `tokio::mpsc` to async consumer. Two-pass processing: mutations then edge rebuild. Move detection via content hash matching. Placement correction learning on file moves (centroid adjustment + frontmatter stripping). Startup reconciliation catches changes since last shutdown
42+
- **Fuzzy link matching:** Sliding window of N words over content, compared via `strsim::normalized_levenshtein` with 0.92 threshold. First-name matching for People notes (uniqueness check, 650bp confidence, suggestion-only). Overlap resolution: exact > alias > fuzzy > first-name
43+
- **Centroid updates:** Online mean math (`adjust_folder_centroid`). Incremented on file add, decremented on file remove. Full recompute during bulk indexing
4044
- **Docids:** Each file gets a deterministic 6-char hex ID. Displayed in search results
4145
- **Vault profiles:** `engraph init` auto-detects vault structure and writes `vault.toml`
4246
- **Pluggable models:** `ModelBackend` trait enables future model swapping
@@ -52,16 +56,18 @@ Single vault only. Re-indexing a different vault path triggers a confirmation pr
5256
- `ort` (2.0.0-rc.12) — ONNX Runtime Rust bindings. Pre-release API. Does not provide prebuilt binaries for all targets
5357
- `sqlite-vec` (0.1.8-alpha.1) — SQLite extension for vector search. Provides vec0 virtual tables with KNN via `vec_distance_cosine()`
5458
- `zerocopy` (0.7) — zero-copy serialization for vector data passed to sqlite-vec
55-
- `strsim` (0.11) — string similarity for fuzzy tag matching in the write pipeline
59+
- `strsim` (0.11) — string similarity for fuzzy tag matching and fuzzy link matching
5660
- `time` (0.3) — date/time handling for frontmatter timestamps
5761
- `tokenizers` (0.22) — HuggingFace tokenizer. Needs `fancy-regex` feature
5862
- `ignore` (0.4) — vault walking with `.gitignore` support
5963
- `rusqlite` (0.32) — bundled SQLite with FTS5 support
6064
- `rmcp` (1.2) — MCP server SDK for stdio transport
65+
- `notify` (7.0) — cross-platform filesystem notification (FSEvents on macOS, inotify on Linux)
66+
- `notify-debouncer-full` (0.4) — debouncing + best-effort inode-based rename tracking
6167

6268
## Testing
6369

64-
- Unit tests in each module (`cargo test --lib`) — 190 tests, no network required
70+
- Unit tests in each module (`cargo test --lib`) — 225 tests, no network required
6571
- 1 ignored smoke test (`test_embed_smoke`) — downloads ONNX model, verifies embedding
6672
- Integration tests (`cargo test --test integration -- --ignored`) — require model download
6773

Cargo.lock

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[package]
22
name = "engraph"
3-
version = "0.6.0"
3+
version = "0.7.0"
44
edition = "2024"
55
description = "Local semantic search for Obsidian vaults"
66
license = "MIT"

0 commit comments

Comments
 (0)