You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-`chunker.rs` — smart chunking with break-point scoring algorithm. Finds optimal split points considering headings, code fences, blank lines, and thematic breaks. `split_oversized_chunks()` handles token-aware secondary splitting with overlap
11
11
-`docid.rs` — deterministic 6-char hex IDs for files (SHA-256 of path, truncated). Shown in search results for quick reference
12
-
-`embedder.rs` — downloads and runs `all-MiniLM-L6-v2` ONNX model (384-dim). SHA256-verified on download. Uses `ort` for inference, `tokenizers` for tokenization. Implements `ModelBackend` trait
12
+
-`embedder.rs` — downloads and runs `all-MiniLM-L6-v2` ONNX model (384-dim). SHA256-verified on download. Uses `ort` for inference, `tokenizers` for tokenization. Implements `ModelBackend` trait. **Not `Send`** — all embedding is serial
13
13
-`model.rs` — pluggable `ModelBackend` trait, model registry, and `parse_model_spec()`. Enables future model swapping without changing consumer code
-`context.rs` — context engine. Six functions: `read` (full note content + metadata), `list` (filtered note listing), `vault_map` (structure overview), `who` (person context bundle), `project` (project context bundle), `context_topic` (rich topic context with budget trimming). Pure functions taking `ContextParams` — no model loading except `context_topic` which reuses `search_internal`
16
+
-`context.rs` — context engine. Six functions: `read` (full note content + metadata), `list` (filtered note listing with `created_by` filter), `vault_map` (structure overview), `who` (person context bundle), `project` (project context bundle), `context_topic` (rich topic context with budget trimming). Pure functions taking `ContextParams` — no model loading except `context_topic` which reuses `search_internal`
17
17
-`vecstore.rs` — sqlite-vec virtual table integration. Manages the `vec_chunks` vec0 table for vector storage and KNN search. Handles insert, delete, and search operations against the virtual table
18
18
-`tags.rs` — tag registry module. Maintains a `tag_registry` table tracking known tags with source attribution. Supports fuzzy matching for tag suggestions during note creation
19
-
-`links.rs` — link discovery module. Scans note content for potential wikilink targets using fuzzy basename matching and heading detection. Suggests links that could be added to improve vault connectivity
20
-
-`placement.rs` — folder placement engine. Uses folder centroids (average embeddings per folder) to suggest the best folder for new notes. Falls back to inbox when confidence is low
21
-
-`writer.rs` — write pipeline orchestrator. 5-step pipeline: resolve tags (fuzzy match + register new), discover links, place in folder, atomic file write (temp + rename), and index update. Supports create, append, update_metadata, and move_note operations with mtime-based conflict detection and crash recovery via temp file cleanup
22
-
-`serve.rs` — MCP stdio server via rmcp SDK. Exposes 11 tools: 7 read (search, read, list, vault_map, who, project, context) + 4 write (create, append, update_metadata, move_note). EngraphServer struct with Arc+Mutex wrapping for async handlers. Loads all resources at startup
19
+
-`links.rs` — link discovery module. Three match types: exact basename, fuzzy (sliding window Levenshtein, 0.92 threshold), and first-name (People folder, suggestion-only at 650bp). Overlap resolution via type priority (exact > alias > fuzzy > first-name)
20
+
-`placement.rs` — folder placement engine. Uses folder centroids (online mean of embeddings per folder) to suggest the best folder for new notes. Falls back to inbox when confidence is low. Includes placement correction detection (`detect_correction_from_frontmatter`) and frontmatter stripping for moved files
21
+
-`writer.rs` — write pipeline orchestrator. 5-step pipeline: resolve tags (fuzzy match + register new), discover links (exact + fuzzy), place in folder, atomic file write (temp + rename), and index update. Supports create, append, update_metadata, move_note, archive, and unarchive operations with mtime-based conflict detection and crash recovery via temp file cleanup
22
+
-`watcher.rs` — file watcher for `engraph serve`. OS thread producer (notify-debouncer-full, 2s debounce) sends `Vec<WatchEvent>` over tokio::mpsc to async consumer task. Two-pass batch processing: mutations (index_file/remove_file/rename_file) then edge rebuild. Move detection via content hash matching. Placement correction on file moves. Centroid adjustment on file add/remove. Startup reconciliation via `run_index_shared`
23
+
-`serve.rs` — MCP stdio server via rmcp SDK. Exposes 13 tools: 7 read (search, read, list, vault_map, who, project, context) + 6 write (create, append, update_metadata, move_note, archive, unarchive). EngraphServer struct with Arc+Mutex wrapping for async handlers. Spawns file watcher on startup
23
24
-`graph.rs` — vault graph agent. Extracts wikilink targets, expands search results by following graph connections 1-2 hops. Relevance filtering via FTS5 term check and shared tags
-`indexer.rs` — orchestrates vault walking (via `ignore` crate for `.gitignore` support), diffing, chunking, embedding (Rayon for parallel chunking, serial embedding since `Embedder` is not `Send`), serial writes to store + sqlite-vec + FTS5, vault graph edge building (wikilinks + people detection), and folder centroid computation
-**Incremental indexing:**`diff_vault()` compares file content hashes in SQLite against disk. Changed files have their old chunks, vectors, and edges deleted, then are re-processed. FTS5 and sqlite-vec entries cleaned up alongside store entries
38
39
-**sqlite-vec for vector search:** Vectors stored in a `vec_chunks` virtual table (vec0). KNN search via `vec_distance_cosine()`. Real deletes — no tombstone filtering needed during search
39
-
-**Write pipeline:** 5-step process for creating/modifying notes: (1) resolve tags via fuzzy matching against tag registry, (2) discover potential wikilinks via basename matching, (3) suggest folder placement via centroid similarity, (4) atomic file write (temp + rename for crash safety), (5) immediate index update (embed + insert into sqlite-vec + FTS5 + edges)
40
+
-**Write pipeline:** 5-step process for creating/modifying notes: (1) resolve tags via fuzzy matching against tag registry, (2) discover potential wikilinks via exact + fuzzy matching, (3) suggest folder placement via centroid similarity, (4) atomic file write (temp + rename for crash safety), (5) immediate index update (embed + insert into sqlite-vec + FTS5 + edges)
41
+
-**Warm sync (file watcher):** OS thread watches vault via `notify-debouncer-full` (2s debounce). Events sent over `tokio::mpsc` to async consumer. Two-pass processing: mutations then edge rebuild. Move detection via content hash matching. Placement correction learning on file moves (centroid adjustment + frontmatter stripping). Startup reconciliation catches changes since last shutdown
42
+
-**Fuzzy link matching:** Sliding window of N words over content, compared via `strsim::normalized_levenshtein` with 0.92 threshold. First-name matching for People notes (uniqueness check, 650bp confidence, suggestion-only). Overlap resolution: exact > alias > fuzzy > first-name
43
+
-**Centroid updates:** Online mean math (`adjust_folder_centroid`). Incremented on file add, decremented on file remove. Full recompute during bulk indexing
40
44
-**Docids:** Each file gets a deterministic 6-char hex ID. Displayed in search results
41
45
-**Vault profiles:**`engraph init` auto-detects vault structure and writes `vault.toml`
42
46
-**Pluggable models:**`ModelBackend` trait enables future model swapping
@@ -52,16 +56,18 @@ Single vault only. Re-indexing a different vault path triggers a confirmation pr
52
56
-`ort` (2.0.0-rc.12) — ONNX Runtime Rust bindings. Pre-release API. Does not provide prebuilt binaries for all targets
53
57
-`sqlite-vec` (0.1.8-alpha.1) — SQLite extension for vector search. Provides vec0 virtual tables with KNN via `vec_distance_cosine()`
54
58
-`zerocopy` (0.7) — zero-copy serialization for vector data passed to sqlite-vec
55
-
-`strsim` (0.11) — string similarity for fuzzy tag matching in the write pipeline
59
+
-`strsim` (0.11) — string similarity for fuzzy tag matching and fuzzy link matching
56
60
-`time` (0.3) — date/time handling for frontmatter timestamps
0 commit comments