Merge branch 'feat/v1.0-intelligence'

devwhodevs · devwhodevs · commit 697df6e6c1d4 · 2026-03-25T19:39:41.000+02:00
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,32 @@
 # Changelog
 
+## [1.0.0] - 2026-03-25
+
+### Added
+- **Candle runtime** — replaced ONNX (`ort`) with candle (pure Rust ML framework). Loads GGUF quantized models. Metal acceleration on macOS.
+- **Research orchestrator** — LLM-based query classification (exact/conceptual/relationship/exploratory) with adaptive lane weights. Single LLM call returns intent + 2-4 query expansions.
+- **Cross-encoder reranker** — 4th RRF lane using qwen3-reranker for relevance scoring. Two-pass fusion: 3-lane retrieval → reranker scores top 30 → 4-lane RRF.
+- **Query expansion** — each search runs multiple expanded queries through all retrieval lanes, merged via deduplication.
+- **Heuristic orchestrator** — fast-path intent classification via pattern matching (docids, ticket IDs, "who" queries) when intelligence is disabled. Zero latency.
+- **Intelligence onboarding** — opt-in prompt during `engraph init` and first `engraph index`. Downloads ~1.3GB of optional models.
+- **`engraph configure` command** — `--enable-intelligence`, `--disable-intelligence`, `--model embed|rerank|expand <uri>` for model overrides.
+- **Dimension migration** — auto-detects embedding dimension changes and triggers re-index.
+- **LLM result cache** — SQLite cache for orchestrator results (keyed by query SHA256).
+- **Model override support** — configurable embedding, reranker, and expansion model URIs for multilingual support.
+
+### Changed
+- Embedding model: `all-MiniLM-L6-v2` (ONNX, 384-dim, 23MB) → `embeddinggemma-300M` (GGUF, 256-dim, ~300MB)
+- Search pipeline: hardcoded 3-lane weights → adaptive per-query-intent weights
+- `--explain` output now shows query intent and 4-lane breakdown (semantic, FTS, graph, rerank)
+- `status` command shows intelligence enabled/disabled state
+- `run_search` accepts `Config` parameter (no redundant config load)
+
+### Removed
+- `ort` (ONNX Runtime) dependency
+- `ndarray` dependency
+- `src/embedder.rs` and `src/model.rs` (replaced by `src/llm.rs`)
+- `ModelBackend` trait (replaced by `EmbedModel`)
+
 ## [0.7.0] - 2026-03-25
 
 ### Added
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -4,15 +4,14 @@ Local hybrid search CLI for Obsidian vaults. Rust, MIT licensed.
 
 ## Architecture
 
-Single binary with 20 modules behind a lib crate:
+Single binary with 19 modules behind a lib crate:
 
-- `config.rs` — loads `~/.engraph/config.toml` and `vault.toml`, merges CLI args, provides `data_dir()`
+- `config.rs` — loads `~/.engraph/config.toml` and `vault.toml`, merges CLI args, provides `data_dir()`. Includes `intelligence: Option<bool>` and `[models]` section for model overrides. `Config::save()` writes back to disk.
 - `chunker.rs` — smart chunking with break-point scoring algorithm. Finds optimal split points considering headings, code fences, blank lines, and thematic breaks. `split_oversized_chunks()` handles token-aware secondary splitting with overlap
 - `docid.rs` — deterministic 6-char hex IDs for files (SHA-256 of path, truncated). Shown in search results for quick reference
-- `embedder.rs` — downloads and runs `all-MiniLM-L6-v2` ONNX model (384-dim). SHA256-verified on download. Uses `ort` for inference, `tokenizers` for tokenization. Implements `ModelBackend` trait. **Not `Send`** — all embedding is serial
-- `model.rs` — pluggable `ModelBackend` trait, model registry, and `parse_model_spec()`. Enables future model swapping without changing consumer code
+- `llm.rs` — candle model management. Three traits: `EmbedModel` (embeddings), `RerankModel` (cross-encoder scoring), `OrchestratorModel` (query intent + expansion). Three candle implementations: `CandleEmbed` (custom bidirectional transformer from GGUF for embeddinggemma), `CandleOrchestrator` (quantized_qwen3 for query analysis), `CandleRerank` (quantized_qwen3 for relevance scoring). Also: `MockLlm` for testing, `HfModelUri` for model download, `PromptFormat` for model-family prompt templates, `heuristic_orchestrate()` fast path, `LaneWeights` per query intent
 - `fts.rs` — FTS5 full-text search support. Re-exports `FtsResult` from store. BM25-ranked keyword search
-- `fusion.rs` — Reciprocal Rank Fusion (RRF) engine. Merges semantic + FTS5 + graph results. Supports lane weighting, `--explain` output with per-lane detail
+- `fusion.rs` — Reciprocal Rank Fusion (RRF) engine. Merges semantic + FTS5 + graph + reranker results. Supports per-lane weighting, `--explain` output with intent + per-lane detail
 - `context.rs` — context engine. Six functions: `read` (full note content + metadata), `list` (filtered note listing with `created_by` filter), `vault_map` (structure overview), `who` (person context bundle), `project` (project context bundle), `context_topic` (rich topic context with budget trimming). Pure functions taking `ContextParams` — no model loading except `context_topic` which reuses `search_internal`
 - `vecstore.rs` — sqlite-vec virtual table integration. Manages the `vec_chunks` vec0 table for vector storage and KNN search. Handles insert, delete, and search operations against the virtual table
 - `tags.rs` — tag registry module. Maintains a `tag_registry` table tracking known tags with source attribution. Supports fuzzy matching for tag suggestions during note creation
@@ -23,15 +22,15 @@ Single binary with 20 modules behind a lib crate:
 - `serve.rs` — MCP stdio server via rmcp SDK. Exposes 13 tools: 7 read (search, read, list, vault_map, who, project, context) + 6 write (create, append, update_metadata, move_note, archive, unarchive). EngraphServer struct with Arc+Mutex wrapping for async handlers. Spawns file watcher on startup
 - `graph.rs` — vault graph agent. Extracts wikilink targets, expands search results by following graph connections 1-2 hops. Relevance filtering via FTS5 term check and shared tags
 - `profile.rs` — vault profile detection. Auto-detects PARA/Folders/Flat structure, vault type (Obsidian/Logseq/Plain), wikilinks, frontmatter, tags. Writes/loads `vault.toml`
-- `store.rs` — SQLite persistence. Tables: `meta`, `files` (with docid, created_by), `chunks` (with vector BLOBs), `chunks_fts` (FTS5), `edges` (vault graph), `tombstones`, `tag_registry`, `folder_centroids`, `placement_corrections`, `link_skiplist` (reserved). `vec_chunks` virtual table (sqlite-vec) for KNN search. Handles incremental diffing via content hashes
-- `indexer.rs` — orchestrates vault walking (via `ignore` crate for `.gitignore` support), diffing, chunking, embedding, writes to store + sqlite-vec + FTS5, vault graph edge building (wikilinks + people detection), and folder centroid computation. Exposes `index_file`, `remove_file`, `rename_file` as public per-file functions. `run_index_shared` accepts external store/embedder for watcher FullRescan
-- `search.rs` — hybrid search orchestrator. Runs semantic (sqlite-vec KNN), keyword (FTS5 BM25), and graph expansion lanes, then fuses via RRF
+- `store.rs` — SQLite persistence. Tables: `meta`, `files` (with docid, created_by), `chunks` (with vector BLOBs), `chunks_fts` (FTS5), `edges` (vault graph), `tombstones`, `tag_registry`, `folder_centroids`, `placement_corrections`, `link_skiplist` (reserved), `llm_cache` (orchestrator result cache). `vec_chunks` virtual table (sqlite-vec) for KNN search. Dynamic embedding dimension stored in meta. `has_dimension_mismatch()` and `reset_for_reindex()` for migration
+- `indexer.rs` — orchestrates vault walking (via `ignore` crate for `.gitignore` support), diffing, chunking, embedding, writes to store + sqlite-vec + FTS5, vault graph edge building (wikilinks + people detection), and folder centroid computation. Exposes `index_file`, `remove_file`, `rename_file` as public per-file functions. `run_index_shared` accepts external store/embedder for watcher FullRescan. Dimension migration on model change.
+- `search.rs` — hybrid search orchestrator. `search_with_intelligence()` runs the full pipeline: orchestrate (intent + expansions) → 3-lane retrieval per expansion → RRF pass 1 → reranker 4th lane → RRF pass 2. `search_internal()` is a thin wrapper without intelligence models. Adaptive lane weights per query intent.
 
 `main.rs` is a thin clap CLI (async via `#[tokio::main]`). Subcommands: `index`, `search` (with `--explain`), `status`, `clear`, `init`, `configure`, `models`, `graph` (show/stats), `context` (read/list/vault-map/who/project/topic), `write` (create/append/update-metadata/move), `serve` (MCP stdio server with file watcher).
 
 ## Key patterns
 
-- **3-lane hybrid search:** Queries run through three lanes — semantic (sqlite-vec KNN embeddings), keyword (FTS5 BM25), and graph (wikilink expansion). Results are fused via Reciprocal Rank Fusion (RRF) with configurable lane weights (semantic 1.0, FTS 1.0, graph 0.8)
+- **4-lane hybrid search:** Queries run through up to four lanes — semantic (sqlite-vec KNN embeddings), keyword (FTS5 BM25), graph (wikilink expansion), and cross-encoder reranking. A research orchestrator classifies query intent and sets adaptive lane weights. Two-pass RRF: 3-lane retrieval → reranker scores top 30 → 4-lane fusion. When intelligence is off, falls back to heuristic intent classification with 3-lane search (v0.7 behavior)
 - **Vault graph:** `edges` table stores bidirectional wikilink edges and mention edges. Built during indexing after all files are written. People detection scans for person name/alias mentions using notes from the configured People folder
 - **Graph agent:** Expands seed results by following wikilinks 1-2 hops. Decay: 0.8x for 1-hop, 0.5x for 2-hop. Relevance filter: must contain query term (FTS5) or share tags with seed. Multi-parent merge takes highest score
 - **Smart chunking:** Break-point scoring algorithm assigns scores to potential split points (headings 50-100, code fences 80, thematic breaks 60, blank lines 20). Code fence protection prevents splitting inside code blocks
@@ -47,18 +46,20 @@ Single binary with 20 modules behind a lib crate:
 
 ## Data directory
 
-`~/.engraph/` — hardcoded via `Config::data_dir()`. Contains `engraph.db` (SQLite with FTS5 + sqlite-vec + edges), `models/` (ONNX model + tokenizer), `vault.toml` (vault profile), `config.toml` (user config).
+`~/.engraph/` — hardcoded via `Config::data_dir()`. Contains `engraph.db` (SQLite with FTS5 + sqlite-vec + edges + llm_cache), `models/` (GGUF models + tokenizers), `vault.toml` (vault profile), `config.toml` (user config with intelligence toggle + model overrides).
 
 Single vault only. Re-indexing a different vault path triggers a confirmation prompt.
 
 ## Dependencies to be aware of
 
-- `ort` (2.0.0-rc.12) — ONNX Runtime Rust bindings. Pre-release API. Does not provide prebuilt binaries for all targets
+- `candle-core` (0.9) — HuggingFace pure Rust ML framework. GGUF model loading, tensor ops. `metal` feature for macOS GPU acceleration
+- `candle-nn` (0.9) — neural network building blocks (RmsNorm, rotary embeddings, etc.)
+- `candle-transformers` (0.9) — pre-built transformer model architectures. Used: `quantized_qwen3` for orchestrator + reranker
 - `sqlite-vec` (0.1.8-alpha.1) — SQLite extension for vector search. Provides vec0 virtual tables with KNN via `vec_distance_cosine()`
 - `zerocopy` (0.7) — zero-copy serialization for vector data passed to sqlite-vec
 - `strsim` (0.11) — string similarity for fuzzy tag matching and fuzzy link matching
 - `time` (0.3) — date/time handling for frontmatter timestamps
-- `tokenizers` (0.22) — HuggingFace tokenizer. Needs `fancy-regex` feature
+- `tokenizers` (0.22) — HuggingFace tokenizer. Needs `fancy-regex` feature. Used for all three GGUF models
 - `ignore` (0.4) — vault walking with `.gitignore` support
 - `rusqlite` (0.32) — bundled SQLite with FTS5 support
 - `rmcp` (1.2) — MCP server SDK for stdio transport
@@ -67,9 +68,8 @@ Single vault only. Re-indexing a different vault path triggers a confirmation pr
 
 ## Testing
 
-- Unit tests in each module (`cargo test --lib`) — 225 tests, no network required
-- 1 ignored smoke test (`test_embed_smoke`) — downloads ONNX model, verifies embedding
-- Integration tests (`cargo test --test integration -- --ignored`) — require model download
+- Unit tests in each module (`cargo test --lib`) — 271 tests, no network required
+- Integration tests (`cargo test --test integration -- --ignored`) — require GGUF model download
 
 ## CI/CD
 
diff --git a/README.md b/README.md
@@ -16,11 +16,11 @@ engraph turns your markdown vault into a searchable knowledge graph that AI agen
 
 Plain vector search treats your notes as isolated documents. But knowledge isn't flat — your notes link to each other, share tags, reference the same people and projects. engraph understands these connections.
 
-- **3-lane hybrid search** — semantic embeddings + BM25 full-text + graph expansion, fused via [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf). Finds things that pure vector search misses.
+- **4-lane hybrid search** — semantic embeddings + BM25 full-text + graph expansion + cross-encoder reranking, fused via [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf). An LLM orchestrator classifies queries and adapts lane weights per intent.
 - **MCP server for AI agents** — `engraph serve` exposes 13 tools (search, read, context bundles, note creation) that Claude, Cursor, or any MCP client can call directly.
 - **Real-time sync** — file watcher keeps the index fresh as you edit in Obsidian. No manual re-indexing needed.
 - **Smart write pipeline** — AI agents can create notes with automatic tag resolution, wikilink discovery, and folder placement based on semantic similarity.
-- **Fully local** — ONNX embeddings (`all-MiniLM-L6-v2`, 23MB), SQLite storage, no network required after initial model download.
+- **Fully local** — pure Rust ML via [candle](https://github.com/huggingface/candle) with GGUF models (~300MB mandatory, ~1.3GB optional for intelligence). Metal-accelerated on macOS. No API keys, no cloud.
 
 ## What problem it solves
 
@@ -57,8 +57,8 @@ Your vault (markdown files)
   Claude / Cursor / any MCP client
 ```
 
-1. **Index** — walks your vault, chunks markdown by headings, embeds with a local ONNX model, stores everything in SQLite with FTS5 + sqlite-vec + a wikilink graph
-2. **Search** — runs three lanes in parallel (semantic KNN, BM25 keyword, graph expansion), fuses results via RRF
+1. **Index** — walks your vault, chunks markdown by headings, embeds with a local GGUF model (candle), stores everything in SQLite with FTS5 + sqlite-vec + a wikilink graph
+2. **Search** — an orchestrator classifies the query and sets lane weights, then runs up to four lanes (semantic KNN, BM25 keyword, graph expansion, cross-encoder reranking), fused via RRF
 3. **Serve** — starts an MCP server that AI agents connect to, with a file watcher that re-indexes changes in real time
 
 ## Quick start
@@ -80,7 +80,7 @@ cargo install --git https://github.com/devwhodevs/engraph
 
 ```bash
 engraph index ~/path/to/vault
-# Downloads embedding model on first run (~23MB)
+# Downloads embedding model on first run (~300MB)
 # Incremental — only re-embeds changed files on subsequent runs
 ```
 
@@ -130,8 +130,11 @@ Now Claude can search your vault, read notes, build context bundles, and create
 engraph search "project deadlines" --explain
 ```
 ```
- 1. [0.03] 01-Projects/Q2 Planning.md > ## Milestones  #abc123
-    Semantic: 0.018 | FTS: 0.015 | Graph: 0.008
+Intent: Exploratory
+
+--- Explain ---
+ 1. [0.04] 01-Projects/Q2 Planning.md > ## Milestones  #abc123
+    Semantic: 0.018 | FTS: 0.015 | Graph: 0.008 | Rerank: 0.014
     Q2 deliverables: auth rewrite by April 15, API v2 by May 1...
 ```
 
@@ -181,35 +184,37 @@ engraph resolves tags against the registry (fuzzy matching), discovers potential
 
 | | engraph | Basic RAG (vector-only) | Obsidian search |
 |---|---|---|---|
-| Search method | Semantic + BM25 + graph (3-lane RRF) | Vector similarity only | Keyword only |
+| Search method | 4-lane RRF (semantic + BM25 + graph + reranker) | Vector similarity only | Keyword only |
+| Query understanding | LLM orchestrator classifies intent, adapts weights | None | None |
 | Understands note links | Yes (wikilink graph traversal) | No | Limited (backlinks panel) |
 | AI agent access | MCP server (13 tools) | Custom API needed | No |
 | Write capability | Create/append/move with smart filing | No | Manual |
 | Real-time sync | File watcher, 2s debounce | Manual re-index | N/A |
-| Runs locally | Yes, fully offline | Depends | Yes |
+| Runs locally | Yes, pure Rust + Metal acceleration | Depends | Yes |
 | Setup | One binary, one command | Framework + code | Built-in |
 
 engraph is not a replacement for Obsidian — it's the intelligence layer that sits between your vault and your AI tools.
 
 ## Current capabilities
 
-- 3-lane hybrid search (semantic + FTS5 + graph expansion) with RRF fusion
+- 4-lane hybrid search (semantic + FTS5 + graph + cross-encoder reranker) with two-pass RRF fusion
+- LLM research orchestrator: query intent classification + query expansion + adaptive lane weights
+- Pure Rust ML via candle (GGUF models, Metal acceleration on macOS)
+- Intelligence opt-in: heuristic fallback when disabled, LLM-powered when enabled
 - MCP server with 13 tools (7 read, 6 write) via stdio
 - Real-time file watching with 2s debounce and startup reconciliation
 - Write pipeline: tag resolution, fuzzy link discovery, semantic folder placement
 - Context engine: topic bundles, person bundles, project bundles with token budgets
 - Vault graph: bidirectional wikilink + mention edges with multi-hop expansion
 - Placement correction learning from user file moves
-- Fuzzy link matching (Levenshtein) + first-name matching for People notes
-- Smart chunking with break-point scoring
-- Vault profile auto-detection (PARA, folders, flat)
-- 225 unit tests, CI on macOS + Ubuntu
+- Configurable model overrides for multilingual support
+- 271 unit tests, CI on macOS + Ubuntu
 
 ## Roadmap
 
-- [ ] Research orchestrator — query classification and adaptive lane weighting
+- [x] ~~Research orchestrator — query classification and adaptive lane weighting~~ (v1.0)
+- [x] ~~LLM reranker — optional local model for result quality~~ (v1.0)
 - [ ] Temporal search — find notes by time period, detect trends
-- [ ] LLM reranker — optional local model for result quality
 - [ ] HTTP/REST API — complement MCP with a standard web API
 - [ ] Multi-vault — search across multiple vaults
 - [ ] Vault health monitor — surface orphan notes, broken links, stale content
@@ -222,26 +227,34 @@ Optional config at `~/.engraph/config.toml`:
 vault_path = "~/Documents/MyVault"
 top_n = 10
 exclude = [".obsidian/", "node_modules/", ".git/"]
+
+# Enable LLM-powered intelligence (query expansion + reranking)
+intelligence = true
+
+# Override models for multilingual or custom use
+[models]
+# embed = "hf:Qwen/Qwen3-Embedding-0.6B-GGUF/qwen3-embedding-0.6b-q8_0.gguf"
+# rerank = "hf:ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF/qwen3-reranker-0.6b-q8_0.gguf"
 ```
 
-All data stored in `~/.engraph/` — single SQLite database (~10MB typical), ONNX model, and vault profile.
+All data stored in `~/.engraph/` — single SQLite database (~10MB typical), GGUF models, and vault profile.
 
 ## Development
 
 ```bash
-cargo test --lib          # 225 unit tests, no network
+cargo test --lib          # 271 unit tests, no network
 cargo clippy -- -D warnings
 cargo fmt --check
 
-# Integration tests (downloads ONNX model)
+# Integration tests (downloads GGUF model)
 cargo test --test integration -- --ignored
 ```
 
 ## Contributing
 
 Contributions welcome. Please open an issue first to discuss what you'd like to change.
 
-The codebase is 20 Rust modules behind a lib crate. `CLAUDE.md` in the repo root has detailed architecture documentation for AI-assisted development.
+The codebase is 19 Rust modules behind a lib crate. `CLAUDE.md` in the repo root has detailed architecture documentation for AI-assisted development.
 
 ## License