Skip to content

Commit 697df6e

Browse files
committed
Merge branch 'feat/v1.0-intelligence'
2 parents 7a239d5 + 8b8ced3 commit 697df6e

3 files changed

Lines changed: 75 additions & 35 deletions

File tree

CHANGELOG.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,32 @@
11
# Changelog
22

3+
## [1.0.0] - 2026-03-25
4+
5+
### Added
6+
- **Candle runtime** — replaced ONNX (`ort`) with candle (pure Rust ML framework). Loads GGUF quantized models. Metal acceleration on macOS.
7+
- **Research orchestrator** — LLM-based query classification (exact/conceptual/relationship/exploratory) with adaptive lane weights. Single LLM call returns intent + 2-4 query expansions.
8+
- **Cross-encoder reranker** — 4th RRF lane using qwen3-reranker for relevance scoring. Two-pass fusion: 3-lane retrieval → reranker scores top 30 → 4-lane RRF.
9+
- **Query expansion** — each search runs multiple expanded queries through all retrieval lanes, merged via deduplication.
10+
- **Heuristic orchestrator** — fast-path intent classification via pattern matching (docids, ticket IDs, "who" queries) when intelligence is disabled. Zero latency.
11+
- **Intelligence onboarding** — opt-in prompt during `engraph init` and first `engraph index`. Downloads ~1.3GB of optional models.
12+
- **`engraph configure` command**`--enable-intelligence`, `--disable-intelligence`, `--model embed|rerank|expand <uri>` for model overrides.
13+
- **Dimension migration** — auto-detects embedding dimension changes and triggers re-index.
14+
- **LLM result cache** — SQLite cache for orchestrator results (keyed by query SHA256).
15+
- **Model override support** — configurable embedding, reranker, and expansion model URIs for multilingual support.
16+
17+
### Changed
18+
- Embedding model: `all-MiniLM-L6-v2` (ONNX, 384-dim, 23MB) → `embeddinggemma-300M` (GGUF, 256-dim, ~300MB)
19+
- Search pipeline: hardcoded 3-lane weights → adaptive per-query-intent weights
20+
- `--explain` output now shows query intent and 4-lane breakdown (semantic, FTS, graph, rerank)
21+
- `status` command shows intelligence enabled/disabled state
22+
- `run_search` accepts `Config` parameter (no redundant config load)
23+
24+
### Removed
25+
- `ort` (ONNX Runtime) dependency
26+
- `ndarray` dependency
27+
- `src/embedder.rs` and `src/model.rs` (replaced by `src/llm.rs`)
28+
- `ModelBackend` trait (replaced by `EmbedModel`)
29+
330
## [0.7.0] - 2026-03-25
431

532
### Added

CLAUDE.md

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -4,15 +4,14 @@ Local hybrid search CLI for Obsidian vaults. Rust, MIT licensed.
44

55
## Architecture
66

7-
Single binary with 20 modules behind a lib crate:
7+
Single binary with 19 modules behind a lib crate:
88

9-
- `config.rs` — loads `~/.engraph/config.toml` and `vault.toml`, merges CLI args, provides `data_dir()`
9+
- `config.rs` — loads `~/.engraph/config.toml` and `vault.toml`, merges CLI args, provides `data_dir()`. Includes `intelligence: Option<bool>` and `[models]` section for model overrides. `Config::save()` writes back to disk.
1010
- `chunker.rs` — smart chunking with break-point scoring algorithm. Finds optimal split points considering headings, code fences, blank lines, and thematic breaks. `split_oversized_chunks()` handles token-aware secondary splitting with overlap
1111
- `docid.rs` — deterministic 6-char hex IDs for files (SHA-256 of path, truncated). Shown in search results for quick reference
12-
- `embedder.rs` — downloads and runs `all-MiniLM-L6-v2` ONNX model (384-dim). SHA256-verified on download. Uses `ort` for inference, `tokenizers` for tokenization. Implements `ModelBackend` trait. **Not `Send`** — all embedding is serial
13-
- `model.rs` — pluggable `ModelBackend` trait, model registry, and `parse_model_spec()`. Enables future model swapping without changing consumer code
12+
- `llm.rs` — candle model management. Three traits: `EmbedModel` (embeddings), `RerankModel` (cross-encoder scoring), `OrchestratorModel` (query intent + expansion). Three candle implementations: `CandleEmbed` (custom bidirectional transformer from GGUF for embeddinggemma), `CandleOrchestrator` (quantized_qwen3 for query analysis), `CandleRerank` (quantized_qwen3 for relevance scoring). Also: `MockLlm` for testing, `HfModelUri` for model download, `PromptFormat` for model-family prompt templates, `heuristic_orchestrate()` fast path, `LaneWeights` per query intent
1413
- `fts.rs` — FTS5 full-text search support. Re-exports `FtsResult` from store. BM25-ranked keyword search
15-
- `fusion.rs` — Reciprocal Rank Fusion (RRF) engine. Merges semantic + FTS5 + graph results. Supports lane weighting, `--explain` output with per-lane detail
14+
- `fusion.rs` — Reciprocal Rank Fusion (RRF) engine. Merges semantic + FTS5 + graph + reranker results. Supports per-lane weighting, `--explain` output with intent + per-lane detail
1615
- `context.rs` — context engine. Six functions: `read` (full note content + metadata), `list` (filtered note listing with `created_by` filter), `vault_map` (structure overview), `who` (person context bundle), `project` (project context bundle), `context_topic` (rich topic context with budget trimming). Pure functions taking `ContextParams` — no model loading except `context_topic` which reuses `search_internal`
1716
- `vecstore.rs` — sqlite-vec virtual table integration. Manages the `vec_chunks` vec0 table for vector storage and KNN search. Handles insert, delete, and search operations against the virtual table
1817
- `tags.rs` — tag registry module. Maintains a `tag_registry` table tracking known tags with source attribution. Supports fuzzy matching for tag suggestions during note creation
@@ -23,15 +22,15 @@ Single binary with 20 modules behind a lib crate:
2322
- `serve.rs` — MCP stdio server via rmcp SDK. Exposes 13 tools: 7 read (search, read, list, vault_map, who, project, context) + 6 write (create, append, update_metadata, move_note, archive, unarchive). EngraphServer struct with Arc+Mutex wrapping for async handlers. Spawns file watcher on startup
2423
- `graph.rs` — vault graph agent. Extracts wikilink targets, expands search results by following graph connections 1-2 hops. Relevance filtering via FTS5 term check and shared tags
2524
- `profile.rs` — vault profile detection. Auto-detects PARA/Folders/Flat structure, vault type (Obsidian/Logseq/Plain), wikilinks, frontmatter, tags. Writes/loads `vault.toml`
26-
- `store.rs` — SQLite persistence. Tables: `meta`, `files` (with docid, created_by), `chunks` (with vector BLOBs), `chunks_fts` (FTS5), `edges` (vault graph), `tombstones`, `tag_registry`, `folder_centroids`, `placement_corrections`, `link_skiplist` (reserved). `vec_chunks` virtual table (sqlite-vec) for KNN search. Handles incremental diffing via content hashes
27-
- `indexer.rs` — orchestrates vault walking (via `ignore` crate for `.gitignore` support), diffing, chunking, embedding, writes to store + sqlite-vec + FTS5, vault graph edge building (wikilinks + people detection), and folder centroid computation. Exposes `index_file`, `remove_file`, `rename_file` as public per-file functions. `run_index_shared` accepts external store/embedder for watcher FullRescan
28-
- `search.rs` — hybrid search orchestrator. Runs semantic (sqlite-vec KNN), keyword (FTS5 BM25), and graph expansion lanes, then fuses via RRF
25+
- `store.rs` — SQLite persistence. Tables: `meta`, `files` (with docid, created_by), `chunks` (with vector BLOBs), `chunks_fts` (FTS5), `edges` (vault graph), `tombstones`, `tag_registry`, `folder_centroids`, `placement_corrections`, `link_skiplist` (reserved), `llm_cache` (orchestrator result cache). `vec_chunks` virtual table (sqlite-vec) for KNN search. Dynamic embedding dimension stored in meta. `has_dimension_mismatch()` and `reset_for_reindex()` for migration
26+
- `indexer.rs` — orchestrates vault walking (via `ignore` crate for `.gitignore` support), diffing, chunking, embedding, writes to store + sqlite-vec + FTS5, vault graph edge building (wikilinks + people detection), and folder centroid computation. Exposes `index_file`, `remove_file`, `rename_file` as public per-file functions. `run_index_shared` accepts external store/embedder for watcher FullRescan. Dimension migration on model change.
27+
- `search.rs` — hybrid search orchestrator. `search_with_intelligence()` runs the full pipeline: orchestrate (intent + expansions) → 3-lane retrieval per expansion → RRF pass 1 → reranker 4th lane → RRF pass 2. `search_internal()` is a thin wrapper without intelligence models. Adaptive lane weights per query intent.
2928

3029
`main.rs` is a thin clap CLI (async via `#[tokio::main]`). Subcommands: `index`, `search` (with `--explain`), `status`, `clear`, `init`, `configure`, `models`, `graph` (show/stats), `context` (read/list/vault-map/who/project/topic), `write` (create/append/update-metadata/move), `serve` (MCP stdio server with file watcher).
3130

3231
## Key patterns
3332

34-
- **3-lane hybrid search:** Queries run through three lanes — semantic (sqlite-vec KNN embeddings), keyword (FTS5 BM25), and graph (wikilink expansion). Results are fused via Reciprocal Rank Fusion (RRF) with configurable lane weights (semantic 1.0, FTS 1.0, graph 0.8)
33+
- **4-lane hybrid search:** Queries run through up to four lanes — semantic (sqlite-vec KNN embeddings), keyword (FTS5 BM25), graph (wikilink expansion), and cross-encoder reranking. A research orchestrator classifies query intent and sets adaptive lane weights. Two-pass RRF: 3-lane retrieval → reranker scores top 30 → 4-lane fusion. When intelligence is off, falls back to heuristic intent classification with 3-lane search (v0.7 behavior)
3534
- **Vault graph:** `edges` table stores bidirectional wikilink edges and mention edges. Built during indexing after all files are written. People detection scans for person name/alias mentions using notes from the configured People folder
3635
- **Graph agent:** Expands seed results by following wikilinks 1-2 hops. Decay: 0.8x for 1-hop, 0.5x for 2-hop. Relevance filter: must contain query term (FTS5) or share tags with seed. Multi-parent merge takes highest score
3736
- **Smart chunking:** Break-point scoring algorithm assigns scores to potential split points (headings 50-100, code fences 80, thematic breaks 60, blank lines 20). Code fence protection prevents splitting inside code blocks
@@ -47,18 +46,20 @@ Single binary with 20 modules behind a lib crate:
4746

4847
## Data directory
4948

50-
`~/.engraph/` — hardcoded via `Config::data_dir()`. Contains `engraph.db` (SQLite with FTS5 + sqlite-vec + edges), `models/` (ONNX model + tokenizer), `vault.toml` (vault profile), `config.toml` (user config).
49+
`~/.engraph/` — hardcoded via `Config::data_dir()`. Contains `engraph.db` (SQLite with FTS5 + sqlite-vec + edges + llm_cache), `models/` (GGUF models + tokenizers), `vault.toml` (vault profile), `config.toml` (user config with intelligence toggle + model overrides).
5150

5251
Single vault only. Re-indexing a different vault path triggers a confirmation prompt.
5352

5453
## Dependencies to be aware of
5554

56-
- `ort` (2.0.0-rc.12) — ONNX Runtime Rust bindings. Pre-release API. Does not provide prebuilt binaries for all targets
55+
- `candle-core` (0.9) — HuggingFace pure Rust ML framework. GGUF model loading, tensor ops. `metal` feature for macOS GPU acceleration
56+
- `candle-nn` (0.9) — neural network building blocks (RmsNorm, rotary embeddings, etc.)
57+
- `candle-transformers` (0.9) — pre-built transformer model architectures. Used: `quantized_qwen3` for orchestrator + reranker
5758
- `sqlite-vec` (0.1.8-alpha.1) — SQLite extension for vector search. Provides vec0 virtual tables with KNN via `vec_distance_cosine()`
5859
- `zerocopy` (0.7) — zero-copy serialization for vector data passed to sqlite-vec
5960
- `strsim` (0.11) — string similarity for fuzzy tag matching and fuzzy link matching
6061
- `time` (0.3) — date/time handling for frontmatter timestamps
61-
- `tokenizers` (0.22) — HuggingFace tokenizer. Needs `fancy-regex` feature
62+
- `tokenizers` (0.22) — HuggingFace tokenizer. Needs `fancy-regex` feature. Used for all three GGUF models
6263
- `ignore` (0.4) — vault walking with `.gitignore` support
6364
- `rusqlite` (0.32) — bundled SQLite with FTS5 support
6465
- `rmcp` (1.2) — MCP server SDK for stdio transport
@@ -67,9 +68,8 @@ Single vault only. Re-indexing a different vault path triggers a confirmation pr
6768

6869
## Testing
6970

70-
- Unit tests in each module (`cargo test --lib`) — 225 tests, no network required
71-
- 1 ignored smoke test (`test_embed_smoke`) — downloads ONNX model, verifies embedding
72-
- Integration tests (`cargo test --test integration -- --ignored`) — require model download
71+
- Unit tests in each module (`cargo test --lib`) — 271 tests, no network required
72+
- Integration tests (`cargo test --test integration -- --ignored`) — require GGUF model download
7373

7474
## CI/CD
7575

README.md

Lines changed: 33 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -16,11 +16,11 @@ engraph turns your markdown vault into a searchable knowledge graph that AI agen
1616

1717
Plain vector search treats your notes as isolated documents. But knowledge isn't flat — your notes link to each other, share tags, reference the same people and projects. engraph understands these connections.
1818

19-
- **3-lane hybrid search** — semantic embeddings + BM25 full-text + graph expansion, fused via [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf). Finds things that pure vector search misses.
19+
- **4-lane hybrid search** — semantic embeddings + BM25 full-text + graph expansion + cross-encoder reranking, fused via [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf). An LLM orchestrator classifies queries and adapts lane weights per intent.
2020
- **MCP server for AI agents**`engraph serve` exposes 13 tools (search, read, context bundles, note creation) that Claude, Cursor, or any MCP client can call directly.
2121
- **Real-time sync** — file watcher keeps the index fresh as you edit in Obsidian. No manual re-indexing needed.
2222
- **Smart write pipeline** — AI agents can create notes with automatic tag resolution, wikilink discovery, and folder placement based on semantic similarity.
23-
- **Fully local**ONNX embeddings (`all-MiniLM-L6-v2`, 23MB), SQLite storage, no network required after initial model download.
23+
- **Fully local**pure Rust ML via [candle](https://github.com/huggingface/candle) with GGUF models (~300MB mandatory, ~1.3GB optional for intelligence). Metal-accelerated on macOS. No API keys, no cloud.
2424

2525
## What problem it solves
2626

@@ -57,8 +57,8 @@ Your vault (markdown files)
5757
Claude / Cursor / any MCP client
5858
```
5959

60-
1. **Index** — walks your vault, chunks markdown by headings, embeds with a local ONNX model, stores everything in SQLite with FTS5 + sqlite-vec + a wikilink graph
61-
2. **Search** — runs three lanes in parallel (semantic KNN, BM25 keyword, graph expansion), fuses results via RRF
60+
1. **Index** — walks your vault, chunks markdown by headings, embeds with a local GGUF model (candle), stores everything in SQLite with FTS5 + sqlite-vec + a wikilink graph
61+
2. **Search**an orchestrator classifies the query and sets lane weights, then runs up to four lanes (semantic KNN, BM25 keyword, graph expansion, cross-encoder reranking), fused via RRF
6262
3. **Serve** — starts an MCP server that AI agents connect to, with a file watcher that re-indexes changes in real time
6363

6464
## Quick start
@@ -80,7 +80,7 @@ cargo install --git https://github.com/devwhodevs/engraph
8080

8181
```bash
8282
engraph index ~/path/to/vault
83-
# Downloads embedding model on first run (~23MB)
83+
# Downloads embedding model on first run (~300MB)
8484
# Incremental — only re-embeds changed files on subsequent runs
8585
```
8686

@@ -130,8 +130,11 @@ Now Claude can search your vault, read notes, build context bundles, and create
130130
engraph search "project deadlines" --explain
131131
```
132132
```
133-
1. [0.03] 01-Projects/Q2 Planning.md > ## Milestones #abc123
134-
Semantic: 0.018 | FTS: 0.015 | Graph: 0.008
133+
Intent: Exploratory
134+
135+
--- Explain ---
136+
1. [0.04] 01-Projects/Q2 Planning.md > ## Milestones #abc123
137+
Semantic: 0.018 | FTS: 0.015 | Graph: 0.008 | Rerank: 0.014
135138
Q2 deliverables: auth rewrite by April 15, API v2 by May 1...
136139
```
137140

@@ -181,35 +184,37 @@ engraph resolves tags against the registry (fuzzy matching), discovers potential
181184

182185
| | engraph | Basic RAG (vector-only) | Obsidian search |
183186
|---|---|---|---|
184-
| Search method | Semantic + BM25 + graph (3-lane RRF) | Vector similarity only | Keyword only |
187+
| Search method | 4-lane RRF (semantic + BM25 + graph + reranker) | Vector similarity only | Keyword only |
188+
| Query understanding | LLM orchestrator classifies intent, adapts weights | None | None |
185189
| Understands note links | Yes (wikilink graph traversal) | No | Limited (backlinks panel) |
186190
| AI agent access | MCP server (13 tools) | Custom API needed | No |
187191
| Write capability | Create/append/move with smart filing | No | Manual |
188192
| Real-time sync | File watcher, 2s debounce | Manual re-index | N/A |
189-
| Runs locally | Yes, fully offline | Depends | Yes |
193+
| Runs locally | Yes, pure Rust + Metal acceleration | Depends | Yes |
190194
| Setup | One binary, one command | Framework + code | Built-in |
191195

192196
engraph is not a replacement for Obsidian — it's the intelligence layer that sits between your vault and your AI tools.
193197

194198
## Current capabilities
195199

196-
- 3-lane hybrid search (semantic + FTS5 + graph expansion) with RRF fusion
200+
- 4-lane hybrid search (semantic + FTS5 + graph + cross-encoder reranker) with two-pass RRF fusion
201+
- LLM research orchestrator: query intent classification + query expansion + adaptive lane weights
202+
- Pure Rust ML via candle (GGUF models, Metal acceleration on macOS)
203+
- Intelligence opt-in: heuristic fallback when disabled, LLM-powered when enabled
197204
- MCP server with 13 tools (7 read, 6 write) via stdio
198205
- Real-time file watching with 2s debounce and startup reconciliation
199206
- Write pipeline: tag resolution, fuzzy link discovery, semantic folder placement
200207
- Context engine: topic bundles, person bundles, project bundles with token budgets
201208
- Vault graph: bidirectional wikilink + mention edges with multi-hop expansion
202209
- Placement correction learning from user file moves
203-
- Fuzzy link matching (Levenshtein) + first-name matching for People notes
204-
- Smart chunking with break-point scoring
205-
- Vault profile auto-detection (PARA, folders, flat)
206-
- 225 unit tests, CI on macOS + Ubuntu
210+
- Configurable model overrides for multilingual support
211+
- 271 unit tests, CI on macOS + Ubuntu
207212

208213
## Roadmap
209214

210-
- [ ] Research orchestrator — query classification and adaptive lane weighting
215+
- [x] ~~Research orchestrator — query classification and adaptive lane weighting~~ (v1.0)
216+
- [x] ~~LLM reranker — optional local model for result quality~~ (v1.0)
211217
- [ ] Temporal search — find notes by time period, detect trends
212-
- [ ] LLM reranker — optional local model for result quality
213218
- [ ] HTTP/REST API — complement MCP with a standard web API
214219
- [ ] Multi-vault — search across multiple vaults
215220
- [ ] Vault health monitor — surface orphan notes, broken links, stale content
@@ -222,26 +227,34 @@ Optional config at `~/.engraph/config.toml`:
222227
vault_path = "~/Documents/MyVault"
223228
top_n = 10
224229
exclude = [".obsidian/", "node_modules/", ".git/"]
230+
231+
# Enable LLM-powered intelligence (query expansion + reranking)
232+
intelligence = true
233+
234+
# Override models for multilingual or custom use
235+
[models]
236+
# embed = "hf:Qwen/Qwen3-Embedding-0.6B-GGUF/qwen3-embedding-0.6b-q8_0.gguf"
237+
# rerank = "hf:ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF/qwen3-reranker-0.6b-q8_0.gguf"
225238
```
226239

227-
All data stored in `~/.engraph/` — single SQLite database (~10MB typical), ONNX model, and vault profile.
240+
All data stored in `~/.engraph/` — single SQLite database (~10MB typical), GGUF models, and vault profile.
228241

229242
## Development
230243

231244
```bash
232-
cargo test --lib # 225 unit tests, no network
245+
cargo test --lib # 271 unit tests, no network
233246
cargo clippy -- -D warnings
234247
cargo fmt --check
235248

236-
# Integration tests (downloads ONNX model)
249+
# Integration tests (downloads GGUF model)
237250
cargo test --test integration -- --ignored
238251
```
239252

240253
## Contributing
241254

242255
Contributions welcome. Please open an issue first to discuss what you'd like to change.
243256

244-
The codebase is 20 Rust modules behind a lib crate. `CLAUDE.md` in the repo root has detailed architecture documentation for AI-assisted development.
257+
The codebase is 19 Rust modules behind a lib crate. `CLAUDE.md` in the repo root has detailed architecture documentation for AI-assisted development.
245258

246259
## License
247260

0 commit comments

Comments
 (0)