Skip to content

[BUG] Ingester: trackInflightQueryRequest leaks inflight counter on resource-based rejection — eventually rejects all queries until restart #7538

@sandy2008

Description

@sandy2008

AI Tool Usage Notice
This report was drafted with help from Claude Code and independently cross-checked by 3 Claude agents and 3 Codex agents. All code references, line numbers, and the failing test were verified against master @ 29b6167d6.

Describe the bug

Ingester.trackInflightQueryRequest() (pkg/ingester/ingester.go:2510-2530) increments the atomic inflightQueryRequests counter and then calls resourceBasedLimiter.AcceptNewRequest(). When the limiter rejects (CPU/heap utilization ≥ configured threshold), the function returns (nil, limiter.ErrResourceLimitReached) without decrementing. All four callers — QueryExemplars (L1879), LabelValues (L2006), LabelNames (L2122), and queryStreamChunks (L2651) — propagate the error and never decrement either.

Every resource-rejected query permanently leaks one unit of the inflight counter. Once cumulative rejections reach MaxInflightQueryRequests, the precheck at L2513 short-circuits all subsequent queries with errTooManyInflightQueryRequests, including ones that would not trip the resource limiter. The counter is a process-lifetime atomic.Int64 with no reset path; only restarting the ingester clears it.

Trigger condition (both required):

  • -ingester.instance-limits.max-inflight-query-requests > 0 (default 0 = unlimited)
  • At least one of -ingester.query-protection.rejection.threshold.{cpu,heap}-utilization > 0 (defaults 0 = disabled)

Both knobs target the same defense-in-depth scenario, so operators who enable resource-based query protection are exactly the population at risk.

The push path (Ingester.Push, L1291-1294) uses the safe Inc(); defer Dec() idiom and is unaffected. The query helper diverges because it returns a closure, which gives AcceptNewRequest() a window to fail between Inc and the closure handoff.

Relevant code (current master, pkg/ingester/ingester.go:2510-2530)

func (i *Ingester) trackInflightQueryRequest() (func(), error) {
    gl := i.getInstanceLimits()
    if gl != nil && gl.MaxInflightQueryRequests > 0 {
        if i.inflightQueryRequests.Load() >= gl.MaxInflightQueryRequests {
            return nil, errTooManyInflightQueryRequests
        }
    }

    i.maxInflightQueryRequests.Track(i.inflightQueryRequests.Inc()) // (1) counter incremented

    if i.resourceBasedLimiter != nil {
        if err := i.resourceBasedLimiter.AcceptNewRequest(); err != nil {
            level.Warn(i.logger).Log("msg", "failed to accept request", "err", err)
            return nil, limiter.ErrResourceLimitReached // (2) returns without Dec()
        }
    }

    return func() {
        i.inflightQueryRequests.Dec()
    }, nil
}

To Reproduce

  1. Run Cortex (master @ 29b6167d6) with the ingester configured:
    ingester:
      instance_limits:
        max_inflight_query_requests: 100
      query_protection:
        rejection:
          threshold:
            heap_utilization: 0.5
  2. Drive heap utilization above the threshold so AcceptNewRequest() rejects.
  3. Issue ≥ 100 queries (/query, /labels, /series, /query_exemplars) — each fails with failed to accept request.
  4. Drop heap utilization back below the threshold.
  5. Observe: subsequent queries continue to fail, now with errTooManyInflightQueryRequests, even though nothing is actually in flight. cortex_ingester_inflight_query_requests stays pegged at 100. Only restart recovers.

Self-contained Go unit test that reproduces the leak:

func TestTrackInflightQueryRequestCounterLeak(t *testing.T) {
    cfg := defaultIngesterTestConfig(t)
    cfg.DefaultLimits.MaxInflightQueryRequests = 2
    i, err := prepareIngesterWithBlocksStorage(t, cfg, prometheus.NewRegistry())
    require.NoError(t, err)
    require.NoError(t, services.StartAndAwaitRunning(context.Background(), i))
    defer services.StopAndAwaitTerminated(context.Background(), i) //nolint:errcheck
    test.Poll(t, time.Second, ring.ACTIVE, func() any { return i.lifecycler.GetState() })

    // Resource limiter that always rejects.
    limits := map[resource.Type]float64{resource.CPU: 0.5, resource.Heap: 0.5}
    rejecting, err := limiter.NewResourceBasedLimiter(&mockResourceMonitor{cpu: 0.9, heap: 0.9}, limits, nil, "ingester")
    require.NoError(t, err)
    i.resourceBasedLimiter = rejecting

    ctx := user.InjectOrgID(context.Background(), "test")
    wreq, _ := mockWriteRequest(t, labels.FromStrings("__name__", "leak_test"), 1, 100000)
    _, err = i.Push(ctx, wreq)
    require.NoError(t, err)

    for n := 0; n < 10; n++ {
        s := &mockQueryStreamServer{ctx: ctx}
        err = i.QueryStream(&client.QueryRequest{}, s)
        require.ErrorIs(t, err, limiter.ErrResourceLimitReached, "iter %d", n)
        require.Equalf(t, int64(0), i.inflightQueryRequests.Load(), "iter %d: counter leaked", n)
    }

    // After lifting the limiter, queries must still succeed.
    i.resourceBasedLimiter = nil
    s := &mockQueryStreamServer{ctx: ctx}
    require.NoError(t, i.QueryStream(&client.QueryRequest{}, s))
}

On master this fails at n == 2: the third call returns errTooManyInflightQueryRequests (not ErrResourceLimitReached), and the final non-limited query fails the same way.

Expected behavior

trackInflightQueryRequest() leaves inflightQueryRequests unchanged on every error-return path. Rejected requests must not consume slots in MaxInflightQueryRequests.

Suggested fix direction

Option A — minimal, preserves Track() peak accounting:

 	i.maxInflightQueryRequests.Track(i.inflightQueryRequests.Inc())

 	if i.resourceBasedLimiter != nil {
 		if err := i.resourceBasedLimiter.AcceptNewRequest(); err != nil {
+			i.inflightQueryRequests.Dec()
 			level.Warn(i.logger).Log("msg", "failed to accept request", "err", err)
 			return nil, limiter.ErrResourceLimitReached
 		}
 	}

Option B — reorder so Inc only happens after the limiter accepts (structurally leak-proof):

if i.resourceBasedLimiter != nil {
    if err := i.resourceBasedLimiter.AcceptNewRequest(); err != nil {
        level.Warn(i.logger).Log("msg", "failed to accept request", "err", err)
        return nil, limiter.ErrResourceLimitReached
    }
}
i.maxInflightQueryRequests.Track(i.inflightQueryRequests.Inc())
return func() { i.inflightQueryRequests.Dec() }, nil

B eliminates the bug by construction (every Inc is paired with the returned closure's Dec); A is a smaller diff but reintroduces the same risk if another fallible step is added between the Inc and the successful return. The existing rejection test at pkg/ingester/ingester_test.go:3527-3568 should also be extended to assert i.inflightQueryRequests.Load() == 0 after rejection.

Environment

  • Version: master @ 29b6167d6
  • Affected components: Ingester query path (Query, QueryStream, QueryExemplars, LabelValues, LabelNames)
  • Discovery: code audit

Additional Context

  • Push path is unaffected (Ingester.Push at L1291 uses Inc(); defer Dec()).
  • Test coverage gap: Test_Ingester_Query_ResourceThresholdBreached (pkg/ingester/ingester_test.go:3527-3568, uses mockResourceMonitor{cpu: 0.4, heap: 0.6}) covers the rejection error but never inspects the counter. The three other inflight tests (L3315, L3474, L7065) manually Add(1) and don't exercise the resource-limiter path.
  • No background goroutine resets inflightQueryRequests; recovery requires ingester restart.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions