Simplify fsst_compress_offsets_overflow_i32 test with empty compressor#8158
Open
joseph-isaacs wants to merge 2 commits into
Open
Simplify fsst_compress_offsets_overflow_i32 test with empty compressor#8158joseph-isaacs wants to merge 2 commits into
joseph-isaacs wants to merge 2 commits into
Conversation
The fsst_compress_offsets_overflow_i32 regression compressed ~2.5 GiB of random, incompressible data with a trained compressor to push cumulative FSST output past i32::MAX. Profiling the debug build (samply) showed ~90% of runtime in fsst::Compressor::compress_into and ~18 s in per-byte RNG pool generation. Use an empty FSST compressor instead: with no symbols every byte is emitted as a two-byte escape, so output is deterministically 2x the input. That crosses the i32::MAX boundary with only ~1.1 GiB of input (no random data needed), which is the cheapest possible way to reach the boundary since escapes are FSST's worst-case expansion. The test now also asserts the actual compressed byte size exceeds i32::MAX so it cannot silently stop covering the regression. Measured (debug, single run): 307 s -> 186 s (~1.65x), peak memory ~5 GiB -> ~3.4 GiB, RNG generation eliminated. Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
AdamGS
reviewed
May 29, 2026
| /// The input is built with [`VarBinBuilder<i64>`] so the input itself does not panic, which | ||
| /// confirms the overflow is on the FSST output side. After the fix the test must succeed | ||
| /// with the row count preserved. | ||
| /// We force the output past [`i32::MAX`] with an empty FSST compressor: it has no symbols, so |
Contributor
There was a problem hiding this comment.
is this test the best documented part of the codebase?
Merging this PR will not alter performance
|
Give fsst_compress_offsets_overflow_i32 priority 100 so nextest schedules it at the start of the workspace run, mirroring the existing compress_large_int override. The test still takes ~3 minutes; dispatching it first lets its latency overlap with the rest of the suite instead of trailing at the end as the long pole. Verified locally with nextest 0.9.137: the filter selects exactly this test, and a controlled --test-threads=1 experiment confirmed priority=100 moves a normally-last test to first in dispatch order. Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Refactor the
fsst_compress_offsets_overflow_i32test to use an empty FSST compressor instead of training one on random data. This simplification:Reduces memory allocation from ~5 GiB to ~3.4 GiB by eliminating the random data pool and using deterministic escape-based expansion (2x input size) instead of incompressible random data (~1x input size).
Removes randomness by replacing
StdRngand random pool generation with a single reused buffer of repeated bytes, making the test deterministic and faster.Maintains regression coverage by exercising the same output-offset overflow path. The empty compressor's escape factor (2x + 7 bytes) is the worst-case FSST expansion, so this is the cheapest way to reach the i32::MAX boundary while still crossing it.
Adds explicit assertion that compressed output actually exceeds i32::MAX, preventing silent test degradation if FSST's escape behavior changes.
The test still allocates ~1.1 GiB of input and ~2.25 GiB of output, so it remains gated to CI and respects
VORTEX_SKIP_SLOW_TESTS.Testing
Existing test infrastructure covers this change. The test is gated to CI runs and validates that the regression (i32 offset overflow in FSST output) is properly exercised by asserting the compressed byte count exceeds i32::MAX.
https://claude.ai/code/session_01212r9Sii7DxuVZ1UJ9oqx1