You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
TestVerticalShardingFuzz (integration/query_fuzz_test.go:665) compares results from an unsharded Cortex (c1) against one running with -frontend.query-vertical-shard-size=2 (c2). The fuzzer occasionally generates a query of the shape
(<expr-with-partial-time-coverage>) or vector(<scalar>)
For at least one shard, the LHS sub-expression is empty at certain timestamps, while in the unsharded engine the LHS has data (value 0). The or then falls through to vector(...) in the sharded path but not in the unsharded path, producing a per-timestamp value mismatch that the test's comparer flags as a results mismatch.
This is consistent with prior known semantic-divergence bugs in vertical sharding around functions whose behavior depends on what data the shard sees — see #5203, #5204, #5205 (closed) for the same pattern with absent, absent_over_time, and scalar. The Thanos query analyzer at vendor/github.com/thanos-io/thanos/pkg/querysharding/analyzer.go:111 already marks those three as non-shardable, but it does not include vector.
Hypothesis (not independently proven by replay): when the LHS is itself a complex expression involving stddev_over_time and a binary equality (==), the unsharded engine produces a {} series with value 0 at the failing timestamp, while in the sharded engine — for reasons that may involve range-vector window edges, aggregation push-down, or the shard merger collapsing empty samples — the LHS is absent at that timestamp, causing or vector(...) to fire. The end-user-visible symptom is clear; the precise upstream mechanism is not.
c1 (res1, unsharded) has a 0 at timestamp 1779304415.517; c2 (res2, vertical-shard-size=2) has vector(...)'s value 2.009985… there. atan2(0.6747*0.5485, -0.1738) ≈ 2.0099, confirming the right-hand side of the or is what fires in the sharded path at that timestamp.
Empirical flake rate
Per the prior agent's 18-day scan (~245 CI runs across master + PRs):
TestVerticalShardingFuzz: 1 observed failure in the scan window (≈0.4% of PR CI runs hitting integration_query_fuzz). Rare relative to TestParquetFuzz/TestExpandedPostingsCacheFuzz, but with a clear known-shape root cause that warrants a fix at the test generator rather than waiting for it to recur.
Arch: arm64 in the one observed sample. The broader integration_query_fuzz job has a 67/33 arm64/amd64 skew; the prevailing hypothesis (slower arm64 + amd64-only SIMD in parquet-go widening race windows) does not obviously apply to a semantic-divergence flake. It may be coincidence at n=1.
No further occurrences of this exact shape were captured in the 18-day window; if more samples are needed for confidence, leaving the test as-is and grepping for the vector( substring in future failure logs will identify them.
Root cause
The test (integration/query_fuzz_test.go:778) runs the unsharded vs sharded comparison through runQueryFuzzTestCases and decides shape via shouldUseSampleNumComparer (line 1976) — which checks only for topk/bottomk. Queries containing or vector(...) fall through to the strict comparer (line 1966).
isValidQuery (integration/query_fuzz_test.go:1983-2020) filters generated queries by string match for limitk, limit_ratio, --, and (when skipBackwardIncompat=true) stddev/stdvar/quantile/predict_linear/atan2. TestVerticalShardingFuzz calls runQueryFuzzTestCases(..., false) at line 778, so skipBackwardIncompat=false — meaning stddev_over_time and atan2 are not filtered. There is no filter for vector(.
The upstream Thanos query-sharding analyzer (vendor/github.com/thanos-io/thanos/pkg/querysharding/analyzer.go:103-117) marks queries containing absent, absent_over_time, or scalar as non-shardable because their semantics depend on what data the shard sees. vector(...) is not in that list. As a constant generator, vector() is itself shard-safe; but when used as the RHS of or, it forms an LHS-data-dependent shape analogous to absent — and the analyzer doesn't detect this.
Cortex's wrapper pkg/querysharding/util.go:90-123 (NewDisableBinaryExpressionAnalyzer) only disables binary expressions with VectorMatching.On == false; it does not special-case or vector(...).
The exact upstream mechanism by which the LHS goes missing in one sharded path but not unsharded — at one timestamp — is unverified. Candidates: stddev_over_time window-edge floating-point differences when the input series set per shard differs; merger behavior for empty matrices in or; or a per-step evaluation gap. This issue does not pin down which.
Proposed fix
Primary (test-side, low-risk, ~6 lines):
Extend isValidQuery in integration/query_fuzz_test.go:1983-2020 to skip queries containing vector( regardless of skipBackwardIncompat. Justification: vector(...) as a generator is rare in production, and the divergence with vertical sharding is genuine (engine-level) — the unit-level fuzz test isn't the right vehicle for catching it.
ifstrings.Contains(queryStr, "vector(") {
// or-vector fallback fires at different timestamps in sharded vs unsharded// engines when the LHS has partial time coverage; see #<this issue>.returnfalse
}
Secondary (engine-side, larger, possibly out of scope here):
Add vector to the analyzer's non-shardable function list at vendor/github.com/thanos-io/thanos/pkg/querysharding/analyzer.go:111. This would need an upstream Thanos change and may be wider in impact than warranted — vector() is shardable on its own; only its use as a fallback RHS to or is problematic. A more precise analyzer change (mark or vector(...) non-shardable) is the right long-term fix but is upstream work.
Why not other approaches
Switch to sampleNumComparer for queries containing or vector(: same shape would still mismatch — the unsharded result has one 0 sample, the sharded result has one 2.0099… sample, totals are equal but the test currently uses strict comparer for non-topk cases and changing that for or vector( would mask real bugs of the same shape.
Relax comparer to ignore positions where values differ but counts match: too permissive; this is the test's signal channel for real correctness regressions.
Disable TestVerticalShardingFuzz entirely: loses coverage of the many shapes vertical sharding does handle correctly.
Acceptance criteria
After fix, TestVerticalShardingFuzz runs 1000 instant + 1000 range cases without or vector( queries reaching the comparer (verifiable by grepping logs or instrumenting isValidQuery to count skipped queries).
No regression in TestVerticalShardingFuzz coverage of or-without-vector (e.g. … or {series="x"} still exercised — promqlsmith generates or between vector selectors too).
Linked tracking issue (or note) for the upstream analyzer to detect or vector(...) and mark as non-shardable. This issue can be closed once the test-side filter lands; the upstream fix is a separate follow-up.
Honest caveat: only one failure sample is available, so the root-cause attribution to or vector(...) is well-supported by the failure shape but the deeper mechanism (why the LHS goes missing per-shard at exactly that one timestamp) is unverified. The proposed test-side fix is robust regardless of which mechanism is at play, because it removes the trigger shape.
Summary
TestVerticalShardingFuzz(integration/query_fuzz_test.go:665) compares results from an unsharded Cortex (c1) against one running with-frontend.query-vertical-shard-size=2(c2). The fuzzer occasionally generates a query of the shapeFor at least one shard, the LHS sub-expression is empty at certain timestamps, while in the unsharded engine the LHS has data (value
0). Theorthen falls through tovector(...)in the sharded path but not in the unsharded path, producing a per-timestamp value mismatch that the test'scomparerflags as a results mismatch.This is consistent with prior known semantic-divergence bugs in vertical sharding around functions whose behavior depends on what data the shard sees — see #5203, #5204, #5205 (closed) for the same pattern with
absent,absent_over_time, andscalar. The Thanos query analyzer atvendor/github.com/thanos-io/thanos/pkg/querysharding/analyzer.go:111already marks those three as non-shardable, but it does not includevector.Hypothesis (not independently proven by replay): when the LHS is itself a complex expression involving
stddev_over_timeand a binary equality (==), the unsharded engine produces a{}series with value0at the failing timestamp, while in the sharded engine — for reasons that may involve range-vector window edges, aggregation push-down, or the shard merger collapsing empty samples — the LHS is absent at that timestamp, causingor vector(...)to fire. The end-user-visible symptom is clear; the precise upstream mechanism is not.Most recent occurrence
ubuntu-24.04-arm,arm64, build tagintegration_query_fuzzdependabot/go_modules/go-dependencies-97d45e6f13(the PR's diff is unrelated to query sharding).Failure excerpt
c1(res1, unsharded) has a0at timestamp1779304415.517;c2(res2, vertical-shard-size=2) hasvector(...)'s value2.009985…there.atan2(0.6747*0.5485, -0.1738) ≈ 2.0099, confirming the right-hand side of theoris what fires in the sharded path at that timestamp.Empirical flake rate
Per the prior agent's 18-day scan (~245 CI runs across master + PRs):
TestVerticalShardingFuzz: 1 observed failure in the scan window (≈0.4% of PR CI runs hittingintegration_query_fuzz). Rare relative toTestParquetFuzz/TestExpandedPostingsCacheFuzz, but with a clear known-shape root cause that warrants a fix at the test generator rather than waiting for it to recur.integration_query_fuzzjob has a 67/33 arm64/amd64 skew; the prevailing hypothesis (slower arm64 + amd64-only SIMD inparquet-gowidening race windows) does not obviously apply to a semantic-divergence flake. It may be coincidence at n=1.Sample prior failures
No further occurrences of this exact shape were captured in the 18-day window; if more samples are needed for confidence, leaving the test as-is and grepping for the
vector(substring in future failure logs will identify them.Root cause
integration/query_fuzz_test.go:778) runs the unsharded vs sharded comparison throughrunQueryFuzzTestCasesand decides shape viashouldUseSampleNumComparer(line 1976) — which checks only fortopk/bottomk. Queries containingor vector(...)fall through to the strictcomparer(line 1966).isValidQuery(integration/query_fuzz_test.go:1983-2020) filters generated queries by string match forlimitk,limit_ratio,--, and (whenskipBackwardIncompat=true)stddev/stdvar/quantile/predict_linear/atan2.TestVerticalShardingFuzzcallsrunQueryFuzzTestCases(..., false)at line 778, soskipBackwardIncompat=false— meaningstddev_over_timeandatan2are not filtered. There is no filter forvector(.vendor/github.com/thanos-io/thanos/pkg/querysharding/analyzer.go:103-117) marks queries containingabsent,absent_over_time, orscalaras non-shardable because their semantics depend on what data the shard sees.vector(...)is not in that list. As a constant generator,vector()is itself shard-safe; but when used as the RHS ofor, it forms an LHS-data-dependent shape analogous toabsent— and the analyzer doesn't detect this.pkg/querysharding/util.go:90-123(NewDisableBinaryExpressionAnalyzer) only disables binary expressions withVectorMatching.On == false; it does not special-caseor vector(...).stddev_over_timewindow-edge floating-point differences when the input series set per shard differs; merger behavior for empty matrices inor; or a per-step evaluation gap. This issue does not pin down which.Proposed fix
Primary (test-side, low-risk, ~6 lines):
Extend
isValidQueryinintegration/query_fuzz_test.go:1983-2020to skip queries containingvector(regardless ofskipBackwardIncompat. Justification:vector(...)as a generator is rare in production, and the divergence with vertical sharding is genuine (engine-level) — the unit-level fuzz test isn't the right vehicle for catching it.Secondary (engine-side, larger, possibly out of scope here):
Add
vectorto the analyzer's non-shardable function list atvendor/github.com/thanos-io/thanos/pkg/querysharding/analyzer.go:111. This would need an upstream Thanos change and may be wider in impact than warranted —vector()is shardable on its own; only its use as a fallback RHS tooris problematic. A more precise analyzer change (markor vector(...)non-shardable) is the right long-term fix but is upstream work.Why not other approaches
sampleNumComparerfor queries containingor vector(: same shape would still mismatch — the unsharded result has one0sample, the sharded result has one2.0099…sample, totals are equal but the test currently uses strict comparer for non-topkcases and changing that foror vector(would mask real bugs of the same shape.comparerto ignore positions where values differ but counts match: too permissive; this is the test's signal channel for real correctness regressions.absent_over_time#5203/bug: vertical sharding aggregation withabsentdisplays shard by label #5204/bug: vertical sharding doesn't work well withscalarfunction #5205) and Thanos hasn't enumerated all of them. Filter-at-test plus a tracking issue for the analyzer fix is the practical path.TestVerticalShardingFuzzentirely: loses coverage of the many shapes vertical sharding does handle correctly.Acceptance criteria
TestVerticalShardingFuzzruns 1000 instant + 1000 range cases withoutor vector(queries reaching the comparer (verifiable by grepping logs or instrumentingisValidQueryto count skipped queries).TestVerticalShardingFuzzcoverage ofor-without-vector(e.g.… or {series="x"}still exercised — promqlsmith generatesorbetween vector selectors too).or vector(...)and mark as non-shardable. This issue can be closed once the test-side filter lands; the upstream fix is a separate follow-up.Honest caveat: only one failure sample is available, so the root-cause attribution to
or vector(...)is well-supported by the failure shape but the deeper mechanism (why the LHS goes missing per-shard at exactly that one timestamp) is unverified. The proposed test-side fix is robust regardless of which mechanism is at play, because it removes the trigger shape.