Skip to content

Flaky test: TestPrometheusCompatibilityQueryFuzz / TestExperimentalPromQLFuncsWithPrometheus — error-string comparator sensitive to non-deterministic series-list ordering #7546

@sandy2008

Description

@sandy2008

Summary

Both TestPrometheusCompatibilityQueryFuzz (integration/query_fuzz_test.go:1604-1703) and TestExperimentalPromQLFuncsWithPrometheus (integration/query_fuzz_test.go:163-266) execute random PromQL via runQueryFuzzTestCases against two engines (Cortex vs. upstream Prometheus) and compare results. When a generated query triggers a vector-matching duplicate (e.g. many-to-many on a bool binop with a group by (...) or or subtree), both engines correctly return the same error type and the same logical message — but the trailing [X, Y] series-list inside the message renders in an order that depends on Go map iteration. The comparator at integration/query_fuzz_test.go:1957 is cmp.Equal(tc.err1, tc.err2) (an exact-string equality on *errors.errorString), so any byte-level divergence — including a reordered two-element list inside the message — flips the case to error mismatch and fails the test.

This is a test-side issue: both engines are doing the right thing; the comparator is too strict.

Most recent occurrence

Failure excerpt

    query_fuzz_test.go:1958: case 1457 error mismatch.
        range query: (
            {__name__="test_series_b"}
          <= bool
            group by (series, __name__) (
              (
                  label_replace({__name__="test_series_a"}, "__promqlsmith_dst_label__", "$1", "status_code", "(.*)")
                or
                  -{__name__="test_series_b",status_code="404"}
              )
            )
        )
        err1: execution: found duplicate series for the match group {series="2"} on the right hand-side of the operation: [{series="2"}, {__name__="test_series_a", series="2"}];many-to-many matching not allowed: matching labels must be unique on one side
        err2: execution: found duplicate series for the match group {series="2"} on the right hand-side of the operation: [{__name__="test_series_a", series="2"}, {series="2"}];many-to-many matching not allowed: matching labels must be unique on one side
    query_fuzz_test.go:1972:
        Error:      finished query fuzzing tests
        Test:       TestPrometheusCompatibilityQueryFuzz
        Messages:   1 test cases failed
--- FAIL: TestPrometheusCompatibilityQueryFuzz (18.97s)

Same shape on other occurrences — only the two metric strings inside [ , ] swap places.

Empirical flake rate (last 18 days, ~250 CI runs)

Test Confirmed failures (this 18-day window)
TestPrometheusCompatibilityQueryFuzz 3
TestExperimentalPromQLFuncsWithPrometheus 1 confirmed (a second hit was reported by a prior investigation but I could not re-verify it in the logs available to this scan)
Combined ~4 PR-CI hits

All confirmed failures are the same "[X, Y]" vs "[Y, X]" inside the duplicate series for the match group … error. None observed on master in this window.

Sample prior failures

Root cause

  1. The comparator at integration/query_fuzz_test.go:1957 is if !cmp.Equal(tc.err1, tc.err2). For plain *errors.errorString produced by the HTTP client, cmp.Equal reduces to exact string equality on the error message.
  2. When a generated query exercises many-to-one (e.g. <= bool group by (...) (...)), the Prometheus engine reaches vendor/github.com/prometheus/prometheus/promql/engine.go:2830-2831:
    ev.errorf("found duplicate series for the match group %s on the %s hand-side of the operation: [%s, %s]"+
      ";many-to-many matching not allowed: matching labels must be unique on one side",
      matchedLabels.String(), oneSide, rs.Metric.String(), duplSample.Metric.String())
    Here duplSample is whichever entry already sits in the rightSigs map[string]Sample (engine.go:2809), and rs is the second-arrived sample being iterated from the rhs Vector slice.
  3. rhs is gathered from the RHS sub-expression's output Matrix. For aggregations the output matrix is built by iterating the internal seriess map (engine.go:1396-1399 for rangeEval, engine.go:1525-1527 for rangeEvalAgg's topk/bottomk/limitk branch). Map iteration in Go is randomized per-process. The two engines are two distinct processes (Cortex container vs. Prometheus container), so their map seeds differ → whichever series lands "first" in rightSigs differs → the order of [%s, %s] in the message differs.
  4. The same root cause applies to both tests: both call runQueryFuzzTestCases and Cortex vs. Prometheus paths both delegate to the upstream Prometheus engine for this branch. The only difference is TestExperimentalPromQLFuncsWithPrometheus enables promql-experimental-functions; that flag has no bearing on the binop error path.

Uncertainty

The order was traced back to one map iteration (seriess → output Matrix) inside the RHS sub-expression's evaluator. There may be additional contributing map iterations upstream of that (e.g. inside select / chunkSeries merge), but the seriess map alone is sufficient to produce the observed flake. The downstream message order in engine.go:2830 itself is deterministic given a fixed input slice order — the randomness is entirely upstream, in how the RHS Matrix is assembled.

Proposed fix

Compare error types/codes rather than message bytes. In runQueryFuzzTestCases (integration/query_fuzz_test.go:1956-1960), replace the exact cmp.Equal(tc.err1, tc.err2) with a tolerant comparator:

if tc.err1 != nil || tc.err2 != nil {
    if !sameErrorClass(tc.err1, tc.err2) {
        t.Logf(...) ; failures++
    }
}

where sameErrorClass either:

  • (a) checks both are non-nil and their HTTP API error type/status codes match (the Cortex/Prom HTTP clients already return typed errors with a Type field like execution), or
  • (b) parses the message and canonicalizes the […, …] segment by sorting the comma-separated label strings before comparing.

Option (a) is preferred — it's a few lines, doesn't depend on Prometheus's exact wording, and survives any future error-message rewording. Option (b) is acceptable as a regex-based normalization scoped to this one error string if (a) proves to expose other categories of mismatch.

Why not other approaches

  • Skip queries that produce vector-matching errors: Would require pre-classifying the generator's output, which isValidQuery (integration/query_fuzz_test.go:1983) currently does only via string-contains. We'd need to either re-run the query and inspect the error before deciding to count it (chicken-and-egg) or block large query families (e.g. all bool with set ops), which would drop useful coverage. Not worth it when the comparator fix is local.
  • Sort the label sets in the upstream Prometheus error message: Would require a Prometheus PR plus a vendor bump. Higher coordination cost, longer feedback loop, and would not help us until adopted. We could carry the change in vendor/ but modifying vendored code is against the project's convention (CLAUDE.md).
  • Set GODEBUG=randmapiter=0: That toggle doesn't exist; map iteration randomization is unconditional since Go 1.0. Even if it did, the two engines run in separate containers so they'd still diverge across runs.
  • Re-seed both engines deterministically and run them in-process: Out of scope; the test deliberately runs Cortex in Docker against an external Prometheus binary.

Acceptance criteria

  • No error mismatch failures of the duplicate series for the match group … [X, Y] shape in TestPrometheusCompatibilityQueryFuzz or TestExperimentalPromQLFuncsWithPrometheus across a representative local-run sample.
  • When both engines genuinely return different error types (e.g. one nil + one non-nil; or different Prometheus error categories like execution vs bad_data), the test still fails — i.e. the comparator relaxation does not mask real divergences. Add a unit test for the comparator covering both same-class-different-message and different-class-different-message inputs.
  • No change required in vendor/github.com/prometheus/prometheus/promql/engine.go.
  • CHANGELOG entry omitted (test-only change, not user-facing).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions