Flaky test: TestVerticalShardingFuzz — sharded vs unsharded "or vector(…)" semantic divergence

## Summary

`TestVerticalShardingFuzz` (`integration/query_fuzz_test.go:665`) compares results from an unsharded Cortex (`c1`) against one running with `-frontend.query-vertical-shard-size=2` (`c2`). The fuzzer occasionally generates a query of the shape

```
(<expr-with-partial-time-coverage>) or vector(<scalar>)
```

For at least one shard, the LHS sub-expression is empty at certain timestamps, while in the unsharded engine the LHS has data (value `0`). The `or` then falls through to `vector(...)` in the sharded path but not in the unsharded path, producing a per-timestamp value mismatch that the test's `comparer` flags as a results mismatch.

This is consistent with prior known semantic-divergence bugs in vertical sharding around functions whose behavior depends on what data the shard sees — see #5203, #5204, #5205 (closed) for the same pattern with `absent`, `absent_over_time`, and `scalar`. The Thanos query analyzer at `vendor/github.com/thanos-io/thanos/pkg/querysharding/analyzer.go:111` already marks those three as non-shardable, but it does **not** include `vector`.

Hypothesis (not independently proven by replay): when the LHS is itself a complex expression involving `stddev_over_time` and a binary equality (`==`), the unsharded engine produces a `{}` series with value `0` at the failing timestamp, while in the sharded engine — for reasons that may involve range-vector window edges, aggregation push-down, or the shard merger collapsing empty samples — the LHS is absent at that timestamp, causing `or vector(...)` to fire. The end-user-visible symptom is clear; the precise upstream mechanism is not.

## Most recent occurrence

- Run: https://github.com/cortexproject/cortex/actions/runs/26184234190
- Date: 2026-05-20T19:24:57Z
- Arch: `ubuntu-24.04-arm`, `arm64`, build tag `integration_query_fuzz`
- Branch: `dependabot/go_modules/go-dependencies-97d45e6f13` (the PR's diff is unrelated to query sharding).

## Failure excerpt

```
query_fuzz_test.go:1967: case 1194 results mismatch.
    range query: (
        sum without (job, series) (
          (stddev_over_time({__name__="test_series_a"}[4m]) == -{__name__="test_series_a",series=~".*"})
        )
      or
        vector(((0.6746652128584755 * 0.5484783823211438) atan2 -0.17384075643661537))
    )
    res1 len: 1 data: {} =>
    2.009985198006908 @[1779304385.517]
    0                  @[1779304415.517]
    2.009985198006908 @[1779304445.517]
    2.009985198006908 @[1779304475.517]
    … (all other timestamps 2.009985198006908)
    res2 len: 1 data: {} =>
    2.009985198006908 @[1779304385.517]
    2.009985198006908 @[1779304415.517]
    2.009985198006908 @[1779304445.517]
    … (all timestamps 2.009985198006908)
--- FAIL: TestVerticalShardingFuzz (15.23s)
```

`c1` (`res1`, unsharded) has a `0` at timestamp `1779304415.517`; `c2` (`res2`, vertical-shard-size=2) has `vector(...)`'s value `2.009985…` there. `atan2(0.6747*0.5485, -0.1738) ≈ 2.0099`, confirming the right-hand side of the `or` is what fires in the sharded path at that timestamp.

## Empirical flake rate

Per the prior agent's 18-day scan (~245 CI runs across master + PRs):

- `TestVerticalShardingFuzz`: 1 observed failure in the scan window (≈0.4% of PR CI runs hitting `integration_query_fuzz`). Rare relative to `TestParquetFuzz`/`TestExpandedPostingsCacheFuzz`, but with a clear known-shape root cause that warrants a fix at the test generator rather than waiting for it to recur.
- Arch: arm64 in the one observed sample. The broader `integration_query_fuzz` job has a 67/33 arm64/amd64 skew; the prevailing hypothesis (slower arm64 + amd64-only SIMD in `parquet-go` widening race windows) does not obviously apply to a semantic-divergence flake. It may be coincidence at n=1.

## Sample prior failures

- arm64, 2026-05-20: https://github.com/cortexproject/cortex/actions/runs/26184234190 (only sample collected during this investigation)

No further occurrences of this exact shape were captured in the 18-day window; if more samples are needed for confidence, leaving the test as-is and grepping for the `vector(` substring in future failure logs will identify them.

## Root cause

1. The test (`integration/query_fuzz_test.go:778`) runs the unsharded vs sharded comparison through `runQueryFuzzTestCases` and decides shape via `shouldUseSampleNumComparer` (line 1976) — which checks only for `topk`/`bottomk`. Queries containing `or vector(...)` fall through to the strict `comparer` (line 1966).
2. `isValidQuery` (`integration/query_fuzz_test.go:1983-2020`) filters generated queries by string match for `limitk`, `limit_ratio`, `--`, and (when `skipBackwardIncompat=true`) `stddev`/`stdvar`/`quantile`/`predict_linear`/`atan2`. `TestVerticalShardingFuzz` calls `runQueryFuzzTestCases(..., false)` at line 778, so `skipBackwardIncompat=false` — meaning `stddev_over_time` and `atan2` are *not* filtered. There is no filter for `vector(`.
3. The upstream Thanos query-sharding analyzer (`vendor/github.com/thanos-io/thanos/pkg/querysharding/analyzer.go:103-117`) marks queries containing `absent`, `absent_over_time`, or `scalar` as non-shardable because their semantics depend on what data the shard sees. `vector(...)` is not in that list. As a constant generator, `vector()` is itself shard-safe; but when used as the RHS of `or`, it forms an LHS-data-dependent shape analogous to `absent` — and the analyzer doesn't detect this.
4. Cortex's wrapper `pkg/querysharding/util.go:90-123` (`NewDisableBinaryExpressionAnalyzer`) only disables binary expressions with `VectorMatching.On == false`; it does not special-case `or vector(...)`.
5. The exact upstream mechanism by which the LHS goes missing in one sharded path but not unsharded — at one timestamp — is unverified. Candidates: `stddev_over_time` window-edge floating-point differences when the input series set per shard differs; merger behavior for empty matrices in `or`; or a per-step evaluation gap. This issue does not pin down which.

## Proposed fix

**Primary** (test-side, low-risk, ~6 lines):

Extend `isValidQuery` in `integration/query_fuzz_test.go:1983-2020` to skip queries containing `vector(` regardless of `skipBackwardIncompat`. Justification: `vector(...)` as a generator is rare in production, and the divergence with vertical sharding is genuine (engine-level) — the unit-level fuzz test isn't the right vehicle for catching it.

```go
if strings.Contains(queryStr, "vector(") {
    // or-vector fallback fires at different timestamps in sharded vs unsharded
    // engines when the LHS has partial time coverage; see #<this issue>.
    return false
}
```

**Secondary** (engine-side, larger, possibly out of scope here):

Add `vector` to the analyzer's non-shardable function list at `vendor/github.com/thanos-io/thanos/pkg/querysharding/analyzer.go:111`. This would need an upstream Thanos change and may be wider in impact than warranted — `vector()` is shardable on its own; only its use as a fallback RHS to `or` is problematic. A more precise analyzer change (mark `or vector(...)` non-shardable) is the right long-term fix but is upstream work.

## Why not other approaches

- **Switch to `sampleNumComparer` for queries containing `or vector(`**: same shape would still mismatch — the unsharded result has one `0` sample, the sharded result has one `2.0099…` sample, totals are equal but the test currently uses strict comparer for non-`topk` cases and changing that for `or vector(` would mask real bugs of the same shape.
- **Relax `comparer` to ignore positions where values differ but counts match**: too permissive; this is the test's signal channel for real correctness regressions.
- **Fix the upstream sharding semantics fully**: correct in principle but out of scope of "make CI green" — the divergence is a known class of bugs (#5203/#5204/#5205) and Thanos hasn't enumerated all of them. Filter-at-test plus a tracking issue for the analyzer fix is the practical path.
- **Disable `TestVerticalShardingFuzz` entirely**: loses coverage of the many shapes vertical sharding does handle correctly.

## Acceptance criteria

- [ ] After fix, `TestVerticalShardingFuzz` runs 1000 instant + 1000 range cases without `or vector(` queries reaching the comparer (verifiable by grepping logs or instrumenting `isValidQuery` to count skipped queries).
- [ ] No regression in `TestVerticalShardingFuzz` coverage of `or`-without-`vector` (e.g. `… or {series="x"}` still exercised — promqlsmith generates `or` between vector selectors too).
- [ ] Linked tracking issue (or note) for the upstream analyzer to detect `or vector(...)` and mark as non-shardable. This issue can be closed once the test-side filter lands; the upstream fix is a separate follow-up.

---

*Honest caveat: only one failure sample is available, so the root-cause attribution to `or vector(...)` is well-supported by the failure shape but the deeper mechanism (why the LHS goes missing per-shard at exactly that one timestamp) is unverified. The proposed test-side fix is robust regardless of which mechanism is at play, because it removes the trigger shape.*


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flaky test: TestVerticalShardingFuzz — sharded vs unsharded "or vector(…)" semantic divergence #7547

Summary

Most recent occurrence

Failure excerpt

Empirical flake rate

Sample prior failures

Root cause

Proposed fix

Why not other approaches

Acceptance criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Flaky test: TestVerticalShardingFuzz — sharded vs unsharded "or vector(…)" semantic divergence #7547

Description

Summary

Most recent occurrence

Failure excerpt

Empirical flake rate

Sample prior failures

Root cause

Proposed fix

Why not other approaches

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions