[diskann-garnet] Implement BIN and Q8 quantizers#1050
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds support for BIN (spherical 1-bit) and Q8 (MinMax 8-bit) quantization in diskann-garnet, introducing a type-erased quantizer interface and a “dynamic” accessor/strategy that can switch between full-precision and quantized operation as the quantizer becomes available. It also extends the vectorset tool to allow specifying quantization and distance metric during ingestion, and adds RFC documentation describing the bootstrap/backfill approach.
Changes:
- Add new
diskann-garnetquantizer implementations (BIN + Q8) behind a new type-erasedGarnetQuantizertrait, plus FFI entrypoints for training/backfill. - Rework
diskann-garnetprovider/accessor/strategy to support dynamic switching and reranking behavior. - Update tooling/docs:
vectorsetCLI options for quantizer/metric, new RFC, and small provider/common helpers.
Reviewed changes
Copilot reviewed 14 out of 15 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| vectorset/src/main.rs | Adds CLI flags for quantizer/metric and forwards them into VADD ingestion. |
| vectorset/src/loader.rs | Refactors dataset loader locking/state handling and removes unused fields. |
| rfcs/00000-quantizer-bootstrap.md | Adds an RFC describing quantization bootstrapping/backfill phases and provider responsibilities. |
| diskann-providers/src/common/mod.rs | Re-exports MinMax distance helper types used by Garnet quantization. |
| diskann-providers/src/common/minmax_repr.rs | Adds from_bytes convenience for interpreting raw bytes as MinMax elements. |
| diskann-garnet/src/test_utils.rs | Updates tests to match the new GarnetProvider::new signature (quant type parameter). |
| diskann-garnet/src/quantization.rs | New quantizer implementations and type-erased distance/query computer adapters. |
| diskann-garnet/src/provider.rs | Major provider/accessor/search strategy updates for dynamic quantization, training readiness, and backfill. |
| diskann-garnet/src/lib.rs | Updates index creation for quant types, changes insert() return type, and adds FFI for training/backfill. |
| diskann-garnet/src/garnet.rs | Adds (currently unused) exists_* helpers with explicit dead_code expectations. |
| diskann-garnet/src/fsm.rs | Adds visit_used, reuse locking, and total_used tracking to support safe backfill. |
| diskann-garnet/src/dyn_index.rs | Switches type-erased index operations to use the new dynamic strategy and exposes train/backfill hooks. |
| diskann-garnet/src/alloc.rs | Makes AlignToEight public for cross-module aligned byte container usage. |
| diskann-garnet/Cargo.toml | Adds rand and adjusts tokio features for new code usage. |
| Cargo.lock | Locks new dependency resolution (notably rand). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
hildebrandmw
left a comment
There was a problem hiding this comment.
Thanks Jack! It will be great to have quantization land in diskann-garnet.
After seeing the 1000-foot view, the things that stand out to me are:
- The somewhat awkward representation of quantizer (
Option<Box<dyn>>inGarnetProviderwithRwLock<Option<_>>as the implementation). - Heavy use of conditionals throughout vector retrieval checking for quantization, especially with respect to the start point cache. This also necessitated the working-set check and drain if we switched to quantization part-way through insert.
- Race conditions in the free space map and quantization.
One idea I had regarding the first two points is to lean more heavily into trait objects with something like
trait Vector {
// The number of bytes needed per vector
fn bytes(&self) -> usize;
fn is_quantized(&self) -> bool;
// Ther term to provide to `Garnet` for data retrieval.
fn term(&self) -> Term;
fn cached(&self, id: u32) -> Option<&[u8]>;
fn distance_computer(&self) -> Box<dyn _>;
fn query_computer(&self, ...) -> Box<dyn _>;
}The idea being that both full-precision and quantized vectors can live behind the same facade. Then in the Provider, we can have:
// Naming and exact structure are just meant to describe the idea
struct GarnetProvider {
primary: Arc<dyn Vector>,
quant: Option<Arc<dyn Vector>>,
...
}
The Accessor can then be
struct Accessor<'a> {
provider: &'a GarnetProvider,
vectors: &'a dyn Vector,
}
With most of the non-reranking vector needs going through Accessor::vectors. When no quantization is active, then vectors = GarnetProvider::primary, otherwise, vectors = GarnetProvider::quant. Then all the vector retrieval code stays pretty much the same without needing to constantly branch on whether or not the vectors are quantized. Further, the issue with the working set goes away because vectors that start their insert as full-precision stay as full-precision for the duration of the insert.
14b6fe9 to
b5a3759
Compare
ff815c7 to
8c7e374
Compare
hildebrandmw
left a comment
There was a problem hiding this comment.
I'm still worried for the potential for race conditions between backfill and the free-space map. Basically, FreeSpaceMap::next_id can be running concurrently with quantization - the check at the beginning is not sufficient for mutual exclusion. This can result in IDs being sent out that never get quantized which will explode when accessing these vectors in the future. I understand the pressure to start experimenting but would appreciate some cycles in a followup working through this interaction if it is deemed sufficiently unlikely to block development.
a2821d8 to
025fa7d
Compare
025fa7d to
ab9dddd
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1050 +/- ##
==========================================
- Coverage 89.46% 88.87% -0.60%
==========================================
Files 482 485 +3
Lines 91082 92112 +1030
==========================================
+ Hits 81491 81866 +375
- Misses 9591 10246 +655
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
ab9dddd to
cb1cb19
Compare
|
I've added a lot more documentation so that it's clear what the atomics are doing. They aren't really interacting very much. Garnet may invoke training several times due to concurrent inserts, so that call must ensure training only gets actually executed successfully once. Once training is completed, a quantizer will exist and |
fcf77d0 to
fedbe85
Compare
fedbe85 to
9dc0f78
Compare
hildebrandmw
left a comment
There was a problem hiding this comment.
Sorry - a few more issues I found. I think it's worth keeping next_id as an atomic but introducing a proper RwLock around the quantization-enabled + max backfill ID. I think that shores up the rest of the potential concurrency problems.
In general, I'd say try to avoid spreading signals across multiple atomics because weird things can happen that are very hard to reason about.
4f83294 to
56dd391
Compare
56dd391 to
aa35e5d
Compare
aa35e5d to
9542b47
Compare
9542b47 to
c16371e
Compare
This implements the
BINandQ8quantizers for diskann-garnet.This PR includes the following:
Poly<[u8], AlignOfEight>instead ofVec<T>. This means that the same type can be used for both quantized and full precision vectors.GarnetQuantizertrait is inroduced which allows us to type-erase the different quantizers instead of parameterizing the index/provider types.insert()now returns a success flag which can also signal that an associated quantizer is ready for training.build_quant_table()is called by Garnet asynchronously afterinsert()signals readiness for training to build quantization tables, andbackfill_quant_vectorsis called when training is complete to quantize previously inserted vectors. These new calls allow Garnet to control the training and backfill threads.DynamicAccessorbecause it dynamically switches between full precision only and quantized only operation depending on availability and readiness of the associated quantizer.visit_used()call was added to the FSM to facilitate walking all stored vectors.Disk persistence will be done in a follow along with more extensive testing.
Because of the FFI changes, the version has been bumped to 2.0.0.