[diskann-garnet] Implement BIN and Q8 quantizers by metajack · Pull Request #1050 · microsoft/DiskANN

metajack · 2026-05-11T14:30:27Z

This implements the BIN and Q8 quantizers for diskann-garnet.

This PR includes the following:

Vectors are now stored as Poly<[u8], AlignOfEight> instead of Vec<T>. This means that the same type can be used for both quantized and full precision vectors.
A new GarnetQuantizer trait is inroduced which allows us to type-erase the different quantizers instead of parameterizing the index/provider types.
Two new FFI calls were added to diskann-garnet. insert() now returns a success flag which can also signal that an associated quantizer is ready for training. build_quant_table() is called by Garnet asynchronously after insert() signals readiness for training to build quantization tables, and backfill_quant_vectors is called when training is complete to quantize previously inserted vectors. These new calls allow Garnet to control the training and backfill threads.
The accessor is now called DynamicAccessor because it dynamically switches between full precision only and quantized only operation depending on availability and readiness of the associated quantizer.
A new visit_used() call was added to the FSM to facilitate walking all stored vectors.
The FSM is now lockable, which prevents it reusing IDs during quantization backfill. This means once backfilling starts, new inserts will be assigned IDs above any that need quantization.
vectorset now supports quantization and distance metric settings, and a concurrency bug with multi-threaded inserts was fixed which caused IDs to be assigned incorrectly.

Disk persistence will be done in a follow along with more extensive testing.

Because of the FFI changes, the version has been bumped to 2.0.0.

Copilot

Pull request overview

This PR adds support for BIN (spherical 1-bit) and Q8 (MinMax 8-bit) quantization in diskann-garnet, introducing a type-erased quantizer interface and a “dynamic” accessor/strategy that can switch between full-precision and quantized operation as the quantizer becomes available. It also extends the vectorset tool to allow specifying quantization and distance metric during ingestion, and adds RFC documentation describing the bootstrap/backfill approach.

Changes:

Add new diskann-garnet quantizer implementations (BIN + Q8) behind a new type-erased GarnetQuantizer trait, plus FFI entrypoints for training/backfill.
Rework diskann-garnet provider/accessor/strategy to support dynamic switching and reranking behavior.
Update tooling/docs: vectorset CLI options for quantizer/metric, new RFC, and small provider/common helpers.

Reviewed changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
vectorset/src/main.rs	Adds CLI flags for quantizer/metric and forwards them into `VADD` ingestion.
vectorset/src/loader.rs	Refactors dataset loader locking/state handling and removes unused fields.
rfcs/00000-quantizer-bootstrap.md	Adds an RFC describing quantization bootstrapping/backfill phases and provider responsibilities.
diskann-providers/src/common/mod.rs	Re-exports MinMax distance helper types used by Garnet quantization.
diskann-providers/src/common/minmax_repr.rs	Adds `from_bytes` convenience for interpreting raw bytes as MinMax elements.
diskann-garnet/src/test_utils.rs	Updates tests to match the new `GarnetProvider::new` signature (quant type parameter).
diskann-garnet/src/quantization.rs	New quantizer implementations and type-erased distance/query computer adapters.
diskann-garnet/src/provider.rs	Major provider/accessor/search strategy updates for dynamic quantization, training readiness, and backfill.
diskann-garnet/src/lib.rs	Updates index creation for quant types, changes `insert()` return type, and adds FFI for training/backfill.
diskann-garnet/src/garnet.rs	Adds (currently unused) `exists_*` helpers with explicit dead_code expectations.
diskann-garnet/src/fsm.rs	Adds `visit_used`, reuse locking, and `total_used` tracking to support safe backfill.
diskann-garnet/src/dyn_index.rs	Switches type-erased index operations to use the new dynamic strategy and exposes train/backfill hooks.
diskann-garnet/src/alloc.rs	Makes `AlignToEight` public for cross-module aligned byte container usage.
diskann-garnet/Cargo.toml	Adds `rand` and adjusts `tokio` features for new code usage.
Cargo.lock	Locks new dependency resolution (notably `rand`).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

hildebrandmw

Thanks Jack! It will be great to have quantization land in diskann-garnet.

After seeing the 1000-foot view, the things that stand out to me are:

The somewhat awkward representation of quantizer (Option<Box<dyn>> in GarnetProvider with RwLock<Option<_>> as the implementation).
Heavy use of conditionals throughout vector retrieval checking for quantization, especially with respect to the start point cache. This also necessitated the working-set check and drain if we switched to quantization part-way through insert.
Race conditions in the free space map and quantization.

One idea I had regarding the first two points is to lean more heavily into trait objects with something like

trait Vector {
    // The number of bytes needed per vector
    fn bytes(&self) -> usize;
    fn is_quantized(&self) -> bool;
    // Ther term to provide to `Garnet` for data retrieval.
    fn term(&self) -> Term;
    fn cached(&self, id: u32) -> Option<&[u8]>;
    fn distance_computer(&self) -> Box<dyn _>;
    fn query_computer(&self, ...) -> Box<dyn _>;
}

The idea being that both full-precision and quantized vectors can live behind the same facade. Then in the Provider, we can have:

// Naming and exact structure are just meant to describe the idea
struct GarnetProvider {
    primary: Arc<dyn Vector>,
    quant: Option<Arc<dyn Vector>>,
    ...
}

The Accessor can then be

struct Accessor<'a> {
    provider: &'a GarnetProvider,
    vectors: &'a dyn Vector,
}

With most of the non-reranking vector needs going through Accessor::vectors. When no quantization is active, then vectors = GarnetProvider::primary, otherwise, vectors = GarnetProvider::quant. Then all the vector retrieval code stays pretty much the same without needing to constantly branch on whether or not the vectors are quantized. Further, the issue with the working set goes away because vectors that start their insert as full-precision stay as full-precision for the duration of the insert.

hildebrandmw

I'm still worried for the potential for race conditions between backfill and the free-space map. Basically, FreeSpaceMap::next_id can be running concurrently with quantization - the check at the beginning is not sufficient for mutual exclusion. This can result in IDs being sent out that never get quantized which will explode when accessing these vectors in the future. I understand the pressure to start experimenting but would appreciate some cycles in a followup working through this interaction if it is deemed sufficiently unlikely to block development.

JordanMaples

lgtm

codecov-commenter · 2026-05-27T19:52:18Z

Codecov Report

❌ Patch coverage is 43.34305% with 583 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.87%. Comparing base (e2dc9a0) to head (c16371e).
⚠️ Report is 6 commits behind head on main.

Files with missing lines	Patch %	Lines
diskann-garnet/src/provider.rs	45.16%	244 Missing ⚠️
diskann-garnet/src/quantization.rs	0.00%	161 Missing ⚠️
vectorset/src/main.rs	0.00%	55 Missing ⚠️
diskann-garnet/src/fsm.rs	65.95%	48 Missing ⚠️
diskann-garnet/src/lib.rs	41.09%	43 Missing ⚠️
vectorset/src/loader.rs	0.00%	17 Missing ⚠️
diskann-garnet/src/dyn_index.rs	30.76%	9 Missing ⚠️
diskann-garnet/src/garnet.rs	96.55%	3 Missing ⚠️
diskann-providers/src/common/minmax_repr.rs	0.00%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1050      +/-   ##
==========================================
- Coverage   89.46%   88.87%   -0.60%     
==========================================
  Files         482      485       +3     
  Lines       91082    92112    +1030     
==========================================
+ Hits        81491    81866     +375     
- Misses       9591    10246     +655

Flag	Coverage Δ
miri	`88.87% <43.34%> (-0.60%)`	⬇️
unittests	`88.52% <43.34%> (-0.59%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
diskann-garnet/src/labels.rs	`100.00% <100.00%> (ø)`
diskann-garnet/src/test_utils.rs	`99.36% <100.00%> (+0.05%)`	⬆️
diskann-quantization/src/num.rs	`100.00% <ø> (ø)`
diskann-garnet/src/garnet.rs	`96.69% <96.55%> (-0.95%)`	⬇️
diskann-providers/src/common/minmax_repr.rs	`97.11% <0.00%> (-0.78%)`	⬇️
diskann-garnet/src/dyn_index.rs	`65.71% <30.76%> (-8.48%)`	⬇️
vectorset/src/loader.rs	`0.00% <0.00%> (ø)`
diskann-garnet/src/lib.rs	`62.10% <41.09%> (-5.04%)`	⬇️
diskann-garnet/src/fsm.rs	`81.77% <65.95%> (-5.16%)`	⬇️
vectorset/src/main.rs	`0.00% <0.00%> (ø)`
... and 2 more

... and 27 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

metajack · 2026-05-28T15:43:19Z

I've added a lot more documentation so that it's clear what the atomics are doing. They aren't really interacting very much.

Garnet may invoke training several times due to concurrent inserts, so that call must ensure training only gets actually executed successfully once. training_started handles this.

Once training is completed, a quantizer will exist and insert() will automatcally quantize any new vectors. Once we start backfilling, we want all new vectors to have new IDs so that we constrain the backfill range. To achieve this, we have a new lock on ID reuse in the FSM. This gets locked when backfill starts and only gets unlocked once the final backfill task is complete. To track completion, we have an atomic counter backfills_completed that increments as each backfill task completes its assigned range of backfills. Once that counter is at the task_counts, we quantize the start points and unlock the ID ranges and then set the index to operate in quantized mode all_quantized`.

hildebrandmw

Sorry - a few more issues I found. I think it's worth keeping next_id as an atomic but introducing a proper RwLock around the quantization-enabled + max backfill ID. I think that shores up the rest of the potential concurrency problems.

In general, I'd say try to avoid spreading signals across multiple atomics because weird things can happen that are very hard to reason about.

metajack requested review from a team and Copilot May 11, 2026 14:30

Copilot started reviewing on behalf of metajack May 11, 2026 14:31 View session

Copilot AI reviewed May 11, 2026

View reviewed changes

hildebrandmw reviewed May 11, 2026

View reviewed changes

metajack force-pushed the jackmoffitt/quant-bootstrap branch from 14b6fe9 to b5a3759 Compare May 11, 2026 20:16

metajack self-assigned this May 12, 2026

metajack linked an issue May 12, 2026 that may be closed by this pull request

Implement quantization bootstrap in diskann-garnet #1055

Closed

harsha-simhadri reviewed May 12, 2026

View reviewed changes

Comment thread rfcs/00000-quantizer-bootstrap.md Outdated

harsha-simhadri reviewed May 13, 2026

View reviewed changes

metajack force-pushed the jackmoffitt/quant-bootstrap branch 3 times, most recently from ff815c7 to 8c7e374 Compare May 22, 2026 20:15

hildebrandmw approved these changes May 22, 2026

View reviewed changes

Comment thread diskann-garnet/src/dyn_index.rs Outdated

Comment thread diskann-garnet/src/provider.rs Outdated

Comment thread diskann-garnet/src/provider.rs Outdated

Comment thread diskann-garnet/src/provider.rs

Comment thread diskann-garnet/src/fsm.rs Outdated

metajack force-pushed the jackmoffitt/quant-bootstrap branch 6 times, most recently from a2821d8 to 025fa7d Compare May 27, 2026 16:44

JordanMaples approved these changes May 27, 2026

View reviewed changes

metajack force-pushed the jackmoffitt/quant-bootstrap branch from 025fa7d to ab9dddd Compare May 27, 2026 19:28

metajack force-pushed the jackmoffitt/quant-bootstrap branch from ab9dddd to cb1cb19 Compare May 28, 2026 15:37

metajack force-pushed the jackmoffitt/quant-bootstrap branch 5 times, most recently from fcf77d0 to fedbe85 Compare May 29, 2026 21:22

metajack force-pushed the jackmoffitt/quant-bootstrap branch from fedbe85 to 9dc0f78 Compare May 29, 2026 21:28

hildebrandmw reviewed May 29, 2026

View reviewed changes

Comment thread diskann-garnet/src/fsm.rs Outdated

Comment thread diskann-garnet/src/provider.rs Outdated

Comment thread diskann-garnet/src/fsm.rs

Comment thread diskann-garnet/src/fsm.rs Outdated

metajack force-pushed the jackmoffitt/quant-bootstrap branch 2 times, most recently from 4f83294 to 56dd391 Compare June 1, 2026 19:51

hildebrandmw approved these changes Jun 1, 2026

View reviewed changes

metajack force-pushed the jackmoffitt/quant-bootstrap branch from 56dd391 to aa35e5d Compare June 1, 2026 20:28

hildebrandmw approved these changes Jun 1, 2026

View reviewed changes

Jack Moffitt added 2 commits June 1, 2026 15:47

Implement BIN and Q8 quantizers

19de0e1

address feedback

f7c6046

metajack force-pushed the jackmoffitt/quant-bootstrap branch from aa35e5d to 9542b47 Compare June 1, 2026 21:29

hildebrandmw approved these changes Jun 1, 2026

View reviewed changes

new FFI for i8

c16371e

metajack force-pushed the jackmoffitt/quant-bootstrap branch from 9542b47 to c16371e Compare June 1, 2026 22:08

hildebrandmw approved these changes Jun 1, 2026

View reviewed changes

metajack merged commit 77f9e9d into main Jun 2, 2026
25 of 26 checks passed

metajack deleted the jackmoffitt/quant-bootstrap branch June 2, 2026 14:24

metajack linked an issue Jun 2, 2026 that may be closed by this pull request

Add support for i8 vectors #1088

Closed

Conversation

metajack commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hildebrandmw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hildebrandmw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JordanMaples left a comment

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

metajack commented May 28, 2026

Uh oh!

hildebrandmw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

metajack commented May 11, 2026 •

edited

Loading

codecov-commenter commented May 27, 2026 •

edited

Loading