Skip to content

[diskann-garnet] Implement BIN and Q8 quantizers#1050

Merged
metajack merged 3 commits into
mainfrom
jackmoffitt/quant-bootstrap
Jun 2, 2026
Merged

[diskann-garnet] Implement BIN and Q8 quantizers#1050
metajack merged 3 commits into
mainfrom
jackmoffitt/quant-bootstrap

Conversation

@metajack
Copy link
Copy Markdown
Contributor

@metajack metajack commented May 11, 2026

This implements the BIN and Q8 quantizers for diskann-garnet.

This PR includes the following:

  • Vectors are now stored as Poly<[u8], AlignOfEight> instead of Vec<T>. This means that the same type can be used for both quantized and full precision vectors.
  • A new GarnetQuantizer trait is inroduced which allows us to type-erase the different quantizers instead of parameterizing the index/provider types.
  • Two new FFI calls were added to diskann-garnet. insert() now returns a success flag which can also signal that an associated quantizer is ready for training. build_quant_table() is called by Garnet asynchronously after insert() signals readiness for training to build quantization tables, and backfill_quant_vectors is called when training is complete to quantize previously inserted vectors. These new calls allow Garnet to control the training and backfill threads.
  • The accessor is now called DynamicAccessor because it dynamically switches between full precision only and quantized only operation depending on availability and readiness of the associated quantizer.
  • A new visit_used() call was added to the FSM to facilitate walking all stored vectors.
  • The FSM is now lockable, which prevents it reusing IDs during quantization backfill. This means once backfilling starts, new inserts will be assigned IDs above any that need quantization.
  • vectorset now supports quantization and distance metric settings, and a concurrency bug with multi-threaded inserts was fixed which caused IDs to be assigned incorrectly.

Disk persistence will be done in a follow along with more extensive testing.

Because of the FFI changes, the version has been bumped to 2.0.0.

@metajack metajack requested review from a team and Copilot May 11, 2026 14:30
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for BIN (spherical 1-bit) and Q8 (MinMax 8-bit) quantization in diskann-garnet, introducing a type-erased quantizer interface and a “dynamic” accessor/strategy that can switch between full-precision and quantized operation as the quantizer becomes available. It also extends the vectorset tool to allow specifying quantization and distance metric during ingestion, and adds RFC documentation describing the bootstrap/backfill approach.

Changes:

  • Add new diskann-garnet quantizer implementations (BIN + Q8) behind a new type-erased GarnetQuantizer trait, plus FFI entrypoints for training/backfill.
  • Rework diskann-garnet provider/accessor/strategy to support dynamic switching and reranking behavior.
  • Update tooling/docs: vectorset CLI options for quantizer/metric, new RFC, and small provider/common helpers.

Reviewed changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
vectorset/src/main.rs Adds CLI flags for quantizer/metric and forwards them into VADD ingestion.
vectorset/src/loader.rs Refactors dataset loader locking/state handling and removes unused fields.
rfcs/00000-quantizer-bootstrap.md Adds an RFC describing quantization bootstrapping/backfill phases and provider responsibilities.
diskann-providers/src/common/mod.rs Re-exports MinMax distance helper types used by Garnet quantization.
diskann-providers/src/common/minmax_repr.rs Adds from_bytes convenience for interpreting raw bytes as MinMax elements.
diskann-garnet/src/test_utils.rs Updates tests to match the new GarnetProvider::new signature (quant type parameter).
diskann-garnet/src/quantization.rs New quantizer implementations and type-erased distance/query computer adapters.
diskann-garnet/src/provider.rs Major provider/accessor/search strategy updates for dynamic quantization, training readiness, and backfill.
diskann-garnet/src/lib.rs Updates index creation for quant types, changes insert() return type, and adds FFI for training/backfill.
diskann-garnet/src/garnet.rs Adds (currently unused) exists_* helpers with explicit dead_code expectations.
diskann-garnet/src/fsm.rs Adds visit_used, reuse locking, and total_used tracking to support safe backfill.
diskann-garnet/src/dyn_index.rs Switches type-erased index operations to use the new dynamic strategy and exposes train/backfill hooks.
diskann-garnet/src/alloc.rs Makes AlignToEight public for cross-module aligned byte container usage.
diskann-garnet/Cargo.toml Adds rand and adjusts tokio features for new code usage.
Cargo.lock Locks new dependency resolution (notably rand).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread vectorset/src/main.rs Outdated
Comment thread diskann-garnet/src/quantization.rs Outdated
Comment thread diskann-garnet/src/provider.rs
Comment thread diskann-garnet/src/provider.rs Outdated
Comment thread diskann-garnet/src/provider.rs Outdated
Comment thread rfcs/00000-quantizer-bootstrap.md Outdated
Comment thread rfcs/00000-quantizer-bootstrap.md Outdated
Comment thread rfcs/00000-quantizer-bootstrap.md Outdated
Comment thread rfcs/00000-quantizer-bootstrap.md Outdated
Comment thread rfcs/00000-quantizer-bootstrap.md Outdated
Copy link
Copy Markdown
Contributor

@hildebrandmw hildebrandmw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Jack! It will be great to have quantization land in diskann-garnet.

After seeing the 1000-foot view, the things that stand out to me are:

  • The somewhat awkward representation of quantizer (Option<Box<dyn>> in GarnetProvider with RwLock<Option<_>> as the implementation).
  • Heavy use of conditionals throughout vector retrieval checking for quantization, especially with respect to the start point cache. This also necessitated the working-set check and drain if we switched to quantization part-way through insert.
  • Race conditions in the free space map and quantization.

One idea I had regarding the first two points is to lean more heavily into trait objects with something like

trait Vector {
    // The number of bytes needed per vector
    fn bytes(&self) -> usize;
    fn is_quantized(&self) -> bool;
    // Ther term to provide to `Garnet` for data retrieval.
    fn term(&self) -> Term;
    fn cached(&self, id: u32) -> Option<&[u8]>;
    fn distance_computer(&self) -> Box<dyn _>;
    fn query_computer(&self, ...) -> Box<dyn _>;
}

The idea being that both full-precision and quantized vectors can live behind the same facade. Then in the Provider, we can have:

// Naming and exact structure are just meant to describe the idea
struct GarnetProvider {
    primary: Arc<dyn Vector>,
    quant: Option<Arc<dyn Vector>>,
    ...
}

The Accessor can then be

struct Accessor<'a> {
    provider: &'a GarnetProvider,
    vectors: &'a dyn Vector,
}

With most of the non-reranking vector needs going through Accessor::vectors. When no quantization is active, then vectors = GarnetProvider::primary, otherwise, vectors = GarnetProvider::quant. Then all the vector retrieval code stays pretty much the same without needing to constantly branch on whether or not the vectors are quantized. Further, the issue with the working set goes away because vectors that start their insert as full-precision stay as full-precision for the duration of the insert.

Comment thread diskann-garnet/src/provider.rs Outdated
Comment thread diskann-garnet/src/provider.rs
Comment thread diskann-garnet/src/provider.rs
Comment thread diskann-garnet/src/provider.rs Outdated
Comment thread diskann-garnet/src/quantization.rs
Comment thread diskann-garnet/src/provider.rs Outdated
Comment thread diskann-garnet/src/provider.rs
Comment thread diskann-garnet/src/fsm.rs Outdated
Comment thread diskann-garnet/src/provider.rs Outdated
Comment thread diskann-garnet/src/provider.rs Outdated
@metajack metajack force-pushed the jackmoffitt/quant-bootstrap branch from 14b6fe9 to b5a3759 Compare May 11, 2026 20:16
@metajack metajack self-assigned this May 12, 2026
@metajack metajack linked an issue May 12, 2026 that may be closed by this pull request
Comment thread rfcs/00000-quantizer-bootstrap.md Outdated
Comment thread vectorset/src/main.rs
Comment thread vectorset/src/main.rs
Comment thread vectorset/src/loader.rs
Comment thread diskann-garnet/src/alloc.rs Outdated
Comment thread diskann-garnet/src/lib.rs Outdated
Comment thread diskann-garnet/src/quantization.rs
Comment thread diskann-garnet/src/quantization.rs Outdated
Comment thread diskann-garnet/src/quantization.rs
Comment thread diskann-garnet/src/provider.rs
Comment thread diskann-garnet/src/provider.rs Outdated
@metajack metajack force-pushed the jackmoffitt/quant-bootstrap branch 3 times, most recently from ff815c7 to 8c7e374 Compare May 22, 2026 20:15
Copy link
Copy Markdown
Contributor

@hildebrandmw hildebrandmw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still worried for the potential for race conditions between backfill and the free-space map. Basically, FreeSpaceMap::next_id can be running concurrently with quantization - the check at the beginning is not sufficient for mutual exclusion. This can result in IDs being sent out that never get quantized which will explode when accessing these vectors in the future. I understand the pressure to start experimenting but would appreciate some cycles in a followup working through this interaction if it is deemed sufficiently unlikely to block development.

Comment thread diskann-garnet/src/dyn_index.rs Outdated
Comment thread diskann-garnet/src/provider.rs Outdated
Comment thread diskann-garnet/src/provider.rs Outdated
Comment thread diskann-garnet/src/provider.rs
Comment thread diskann-garnet/src/fsm.rs Outdated
@metajack metajack force-pushed the jackmoffitt/quant-bootstrap branch 6 times, most recently from a2821d8 to 025fa7d Compare May 27, 2026 16:44
Copy link
Copy Markdown
Contributor

@JordanMaples JordanMaples left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@metajack metajack force-pushed the jackmoffitt/quant-bootstrap branch from 025fa7d to ab9dddd Compare May 27, 2026 19:28
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 27, 2026

Codecov Report

❌ Patch coverage is 43.34305% with 583 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.87%. Comparing base (e2dc9a0) to head (c16371e).
⚠️ Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
diskann-garnet/src/provider.rs 45.16% 244 Missing ⚠️
diskann-garnet/src/quantization.rs 0.00% 161 Missing ⚠️
vectorset/src/main.rs 0.00% 55 Missing ⚠️
diskann-garnet/src/fsm.rs 65.95% 48 Missing ⚠️
diskann-garnet/src/lib.rs 41.09% 43 Missing ⚠️
vectorset/src/loader.rs 0.00% 17 Missing ⚠️
diskann-garnet/src/dyn_index.rs 30.76% 9 Missing ⚠️
diskann-garnet/src/garnet.rs 96.55% 3 Missing ⚠️
diskann-providers/src/common/minmax_repr.rs 0.00% 3 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1050      +/-   ##
==========================================
- Coverage   89.46%   88.87%   -0.60%     
==========================================
  Files         482      485       +3     
  Lines       91082    92112    +1030     
==========================================
+ Hits        81491    81866     +375     
- Misses       9591    10246     +655     
Flag Coverage Δ
miri 88.87% <43.34%> (-0.60%) ⬇️
unittests 88.52% <43.34%> (-0.59%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
diskann-garnet/src/labels.rs 100.00% <100.00%> (ø)
diskann-garnet/src/test_utils.rs 99.36% <100.00%> (+0.05%) ⬆️
diskann-quantization/src/num.rs 100.00% <ø> (ø)
diskann-garnet/src/garnet.rs 96.69% <96.55%> (-0.95%) ⬇️
diskann-providers/src/common/minmax_repr.rs 97.11% <0.00%> (-0.78%) ⬇️
diskann-garnet/src/dyn_index.rs 65.71% <30.76%> (-8.48%) ⬇️
vectorset/src/loader.rs 0.00% <0.00%> (ø)
diskann-garnet/src/lib.rs 62.10% <41.09%> (-5.04%) ⬇️
diskann-garnet/src/fsm.rs 81.77% <65.95%> (-5.16%) ⬇️
vectorset/src/main.rs 0.00% <0.00%> (ø)
... and 2 more

... and 27 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@metajack metajack force-pushed the jackmoffitt/quant-bootstrap branch from ab9dddd to cb1cb19 Compare May 28, 2026 15:37
@metajack
Copy link
Copy Markdown
Contributor Author

I've added a lot more documentation so that it's clear what the atomics are doing. They aren't really interacting very much.

Garnet may invoke training several times due to concurrent inserts, so that call must ensure training only gets actually executed successfully once. training_started handles this.

Once training is completed, a quantizer will exist and insert() will automatcally quantize any new vectors. Once we start backfilling, we want all new vectors to have new IDs so that we constrain the backfill range. To achieve this, we have a new lock on ID reuse in the FSM. This gets locked when backfill starts and only gets unlocked once the final backfill task is complete. To track completion, we have an atomic counter backfills_completed that increments as each backfill task completes its assigned range of backfills. Once that counter is at the task_counts, we quantize the start points and unlock the ID ranges and then set the index to operate in quantized mode all_quantized`.

@metajack metajack force-pushed the jackmoffitt/quant-bootstrap branch 5 times, most recently from fcf77d0 to fedbe85 Compare May 29, 2026 21:22
@metajack metajack force-pushed the jackmoffitt/quant-bootstrap branch from fedbe85 to 9dc0f78 Compare May 29, 2026 21:28
Copy link
Copy Markdown
Contributor

@hildebrandmw hildebrandmw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry - a few more issues I found. I think it's worth keeping next_id as an atomic but introducing a proper RwLock around the quantization-enabled + max backfill ID. I think that shores up the rest of the potential concurrency problems.

In general, I'd say try to avoid spreading signals across multiple atomics because weird things can happen that are very hard to reason about.

Comment thread diskann-garnet/src/fsm.rs Outdated
Comment thread diskann-garnet/src/provider.rs Outdated
Comment thread diskann-garnet/src/fsm.rs
Comment thread diskann-garnet/src/fsm.rs Outdated
@metajack metajack force-pushed the jackmoffitt/quant-bootstrap branch 2 times, most recently from 4f83294 to 56dd391 Compare June 1, 2026 19:51
@metajack metajack force-pushed the jackmoffitt/quant-bootstrap branch from 56dd391 to aa35e5d Compare June 1, 2026 20:28
@metajack metajack force-pushed the jackmoffitt/quant-bootstrap branch from aa35e5d to 9542b47 Compare June 1, 2026 21:29
@metajack metajack force-pushed the jackmoffitt/quant-bootstrap branch from 9542b47 to c16371e Compare June 1, 2026 22:08
@metajack metajack merged commit 77f9e9d into main Jun 2, 2026
25 of 26 checks passed
@metajack metajack deleted the jackmoffitt/quant-bootstrap branch June 2, 2026 14:24
@metajack metajack linked an issue Jun 2, 2026 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for i8 vectors Implement quantization bootstrap in diskann-garnet

7 participants