You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CLAUDE.md
+13-10Lines changed: 13 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,29 +4,32 @@ Local knowledge graph + intelligence layer for Obsidian vaults. Rust CLI + MCP s
4
4
5
5
## Architecture
6
6
7
-
Single binary with 19 modules behind a lib crate:
7
+
Single binary with 22 modules behind a lib crate:
8
8
9
-
-`config.rs` — loads `~/.engraph/config.toml` and `vault.toml`, merges CLI args, provides `data_dir()`. Includes `intelligence: Option<bool>` and `[models]` section for model overrides. `Config::save()` writes back to disk.
9
+
-`config.rs` — loads `~/.engraph/config.toml` and `vault.toml`, merges CLI args, provides `data_dir()`. Includes `intelligence: Option<bool>`, `[models]` section for model overrides, `[obsidian]` section (CLI path, enabled flag), and `[agents]` section (registered AI agent names). `Config::save()` writes back to disk.
10
10
-`chunker.rs` — smart chunking with break-point scoring algorithm. Finds optimal split points considering headings, code fences, blank lines, and thematic breaks. `split_oversized_chunks()` handles token-aware secondary splitting with overlap
11
11
-`docid.rs` — deterministic 6-char hex IDs for files (SHA-256 of path, truncated). Shown in search results for quick reference
12
12
-`llm.rs` — ML inference via llama.cpp (Rust bindings: `llama-cpp-2`). Three traits: `EmbedModel` (embeddings), `RerankModel` (cross-encoder scoring), `OrchestratorModel` (query intent + expansion). Three llama.cpp implementations: `LlamaEmbed` (embeddinggemma-300M GGUF on Metal GPU), `LlamaOrchestrator` (Qwen3-0.6B for query analysis + expansion), `LlamaRerank` (Qwen3-Reranker-0.6B for relevance scoring). Global `LlamaBackend` via `OnceLock`. Also: `MockLlm` for testing, `HfModelUri` for model download, `FlexTokenizer` (HuggingFace tokenizers + shimmytok GGUF fallback), `PromptFormat` for model-family prompt templates, `heuristic_orchestrate()` fast path, `LaneWeights` per query intent
-`context.rs` — context engine. Six functions: `read` (full note content + metadata), `list` (filtered note listing with `created_by` filter), `vault_map` (structure overview), `who` (person context bundle), `project` (project context bundle), `context_topic` (rich topic context with budget trimming). Pure functions taking `ContextParams` — no model loading except `context_topic` which reuses `search_internal`
15
+
-`markdown.rs` — section parser. Heading detection (ATX `#` headings with level tracking), section extraction by heading text, frontmatter splitting (YAML block between `---` fences). Powers section-level reading and editing
16
+
-`obsidian.rs` — Obsidian CLI wrapper. Process detection (checks if Obsidian is running), circuit breaker state machine (Closed/Degraded/Open) for resilient CLI delegation, async subprocess execution with timeout. Falls back gracefully when Obsidian is unavailable
17
+
-`health.rs` — vault health diagnostics. Orphan detection (notes with no incoming or outgoing wikilinks), broken link detection (wikilinks pointing to nonexistent notes), stale note detection (notes not modified within configurable threshold), tag hygiene (unused/rare tags). Returns structured health report
18
+
-`context.rs` — context engine. Seven functions: `read` (full note content + metadata), `read_section` (targeted section extraction by heading), `list` (filtered note listing with `created_by` filter), `vault_map` (structure overview), `who` (person context bundle), `project` (project context bundle), `context_topic` (rich topic context with budget trimming). Pure functions taking `ContextParams` — no model loading except `context_topic` which reuses `search_internal`
16
19
-`vecstore.rs` — sqlite-vec virtual table integration. Manages the `vec_chunks` vec0 table for vector storage and KNN search. Handles insert, delete, and search operations against the virtual table
17
20
-`tags.rs` — tag registry module. Maintains a `tag_registry` table tracking known tags with source attribution. Supports fuzzy matching for tag suggestions during note creation
18
21
-`links.rs` — link discovery module. Three match types: exact basename, fuzzy (sliding window Levenshtein, 0.92 threshold), and first-name (People folder, suggestion-only at 650bp). Overlap resolution via type priority (exact > alias > fuzzy > first-name)
19
22
-`placement.rs` — folder placement engine. Uses folder centroids (online mean of embeddings per folder) to suggest the best folder for new notes. Falls back to inbox when confidence is low. Includes placement correction detection (`detect_correction_from_frontmatter`) and frontmatter stripping for moved files
20
-
-`writer.rs` — write pipeline orchestrator. 5-step pipeline: resolve tags (fuzzy match + register new), discover links (exact + fuzzy), place in folder, atomic file write (temp + rename), and index update. Supports create, append, update_metadata, move_note, archive, and unarchive operations with mtime-based conflict detection and crash recovery via temp file cleanup
21
-
-`watcher.rs` — file watcher for `engraph serve`. OS thread producer (notify-debouncer-full, 2s debounce) sends `Vec<WatchEvent>` over tokio::mpsc to async consumer task. Two-pass batch processing: mutations (index_file/remove_file/rename_file) then edge rebuild. Move detection via content hash matching. Placement correction on file moves. Centroid adjustment on file add/remove. Startup reconciliation via `run_index_shared`
22
-
-`serve.rs` — MCP stdio server via rmcp SDK. Exposes 13 tools: 7 read (search, read, list, vault_map, who, project, context) + 6 write (create, append, update_metadata, move_note, archive, unarchive). EngraphServer struct with Arc+Mutex wrapping for async handlers. Loads intelligence models (orchestrator + reranker) when enabled, wires into `search_with_intelligence`. Spawns file watcher on startup
23
+
-`writer.rs` — write pipeline orchestrator. 5-step pipeline: resolve tags (fuzzy match + register new), discover links (exact + fuzzy), place in folder, atomic file write (temp + rename), and index update. Supports create, append, update_metadata, move_note, archive, unarchive, edit (section-level replace/prepend/append), rewrite (full content with frontmatter preservation), edit_frontmatter (granular set/remove/add_tag/remove_tag/add_alias/remove_alias ops), and delete (soft archive or hard permanent) operations with mtime-based conflict detection and crash recovery via temp file cleanup
24
+
-`watcher.rs` — file watcher for `engraph serve`. OS thread producer (notify-debouncer-full, 2s debounce) sends `Vec<WatchEvent>` over tokio::mpsc to async consumer task. Two-pass batch processing: mutations (index_file/remove_file/rename_file) then edge rebuild. Move detection via content hash matching. Placement correction on file moves. Centroid adjustment on file add/remove. Startup reconciliation via `run_index_shared`. `recent_writes` map coordination with MCP server to prevent double re-indexing of files written through the write pipeline
25
+
-`serve.rs` — MCP stdio server via rmcp SDK. Exposes 19 tools: 8 read (search, read, read_section, list, vault_map, who, project, context) + 10 write (create, append, update_metadata, move_note, archive, unarchive, edit, rewrite, edit_frontmatter, delete) + 1 diagnostic (health). `edit_frontmatter` replaces `update_metadata` for granular frontmatter mutations. EngraphServer struct with Arc+Mutex wrapping for async handlers. Loads intelligence models (orchestrator + reranker) when enabled, wires into `search_with_intelligence`. Spawns file watcher on startup. CLI events table provides audit log for write operations. `recent_writes` map prevents double re-indexing of MCP-written files
23
26
-`graph.rs` — vault graph agent. Extracts wikilink targets, expands search results by following graph connections 1-2 hops. Relevance filtering via FTS5 term check and shared tags
-`store.rs` — SQLite persistence. Tables: `meta`, `files` (with docid, created_by), `chunks` (with vector BLOBs), `chunks_fts` (FTS5), `edges` (vault graph), `tombstones`, `tag_registry`, `folder_centroids`, `placement_corrections`, `link_skiplist` (reserved), `llm_cache` (orchestrator result cache). `vec_chunks` virtual table (sqlite-vec) for KNN search. Dynamic embedding dimension stored in meta. `has_dimension_mismatch()` and `reset_for_reindex()` for migration
27
+
-`profile.rs` — vault profile detection. Auto-detects PARA/Folders/Flat structure, vault type (Obsidian/Logseq/Plain), wikilinks, frontmatter, tags. Content-based role detection for people/daily/archive folders by content patterns (not just names). Writes/loads `vault.toml`
28
+
-`store.rs` — SQLite persistence. Tables: `meta`, `files` (with docid, created_by), `chunks` (with vector BLOBs), `chunks_fts` (FTS5), `edges` (vault graph), `tombstones`, `tag_registry`, `folder_centroids`, `placement_corrections`, `link_skiplist` (reserved), `llm_cache` (orchestrator result cache), `cli_events` (audit log for CLI operations). `vec_chunks` virtual table (sqlite-vec) for KNN search. Dynamic embedding dimension stored in meta. `has_dimension_mismatch()` and `reset_for_reindex()` for migration. Enhanced `resolve_file()` with fuzzy Levenshtein matching as final fallback
26
29
-`indexer.rs` — orchestrates vault walking (via `ignore` crate for `.gitignore` support), diffing, chunking, embedding, writes to store + sqlite-vec + FTS5, vault graph edge building (wikilinks + people detection), and folder centroid computation. Exposes `index_file`, `remove_file`, `rename_file` as public per-file functions. `run_index_shared` accepts external store/embedder for watcher FullRescan. Dimension migration on model change.
27
30
-`search.rs` — hybrid search orchestrator. `search_with_intelligence()` runs the full pipeline: orchestrate (intent + expansions) → 3-lane retrieval per expansion → RRF pass 1 → reranker 4th lane → RRF pass 2. `search_internal()` is a thin wrapper without intelligence models. Adaptive lane weights per query intent.
28
31
29
-
`main.rs` is a thin clap CLI (async via `#[tokio::main]`). Subcommands: `index` (with progress bar), `search` (with `--explain`, loads intelligence models when enabled), `status` (shows intelligence state), `clear`, `init` (intelligence onboarding prompt), `configure` (`--enable-intelligence`, `--disable-intelligence`, `--model`), `models`, `graph` (show/stats), `context` (read/list/vault-map/who/project/topic), `write` (create/append/update-metadata/move), `serve` (MCP stdio server with file watcher + intelligence).
32
+
`main.rs` is a thin clap CLI (async via `#[tokio::main]`). Subcommands: `index` (with progress bar), `search` (with `--explain`, loads intelligence models when enabled), `status` (shows intelligence state), `clear`, `init` (intelligence onboarding prompt, detects Obsidian CLI + AI agents), `configure` (`--enable-intelligence`, `--disable-intelligence`, `--model`, `--obsidian-cli`, `--no-obsidian-cli`, `--agent`), `models`, `graph` (show/stats), `context` (read/list/vault-map/who/project/topic), `write` (create/append/update-metadata/move/edit/rewrite/edit-frontmatter/delete), `serve` (MCP stdio server with file watcher + intelligence).
30
33
31
34
## Key patterns
32
35
@@ -70,7 +73,7 @@ Single vault only. Re-indexing a different vault path triggers a confirmation pr
70
73
71
74
## Testing
72
75
73
-
- Unit tests in each module (`cargo test --lib`) — 270 tests, no network required
76
+
- Unit tests in each module (`cargo test --lib`) — 318 tests, no network required
74
77
- Integration tests (`cargo test --test integration -- --ignored`) — require GGUF model download
75
78
- Build requires CMake (for llama.cpp C++ compilation)
0 commit comments