Skip to content

fix: prefer KUBECONFIG over in-cluster config for k8s client#1012

Open
sradco wants to merge 1 commit into
openshift:main-alerts-management-apifrom
sradco:fix-e2e-kubeconfig-fallback
Open

fix: prefer KUBECONFIG over in-cluster config for k8s client#1012
sradco wants to merge 1 commit into
openshift:main-alerts-management-apifrom
sradco:fix-e2e-kubeconfig-fallback

Conversation

@sradco

@sradco sradco commented Jun 24, 2026

Copy link
Copy Markdown

Summary

  • Fix e2e-management-api CI test failures by addressing four issues discovered during CI debugging:
    1. Wrong cluster: Server used rest.InClusterConfig() which connected to the build cluster instead of the provisioned test cluster. Introduce loadKubeConfig() that prefers KUBECONFIG when set.
    2. Dead informers: The k8s client was created with a 30-second timeout context. When createHTTPServer returned, defer initCancel() cancelled the context and stopped all informers — freezing the relabeled-rules cache permanently. Pass the server-lifetime context to NewClient instead.
    3. Missing auth token: The e2e framework only read config.BearerToken, which is empty when the CI kubeconfig uses client-certificate auth. Add resolveToken() that falls back to creating a ServiceAccount token via the TokenRequest API.
    4. Spec-equivalence conflicts: All test rules used vector(1), triggering 409 conflicts across tests. Give each rule a unique PromQL expression.

Root Cause

The CI step monitoring-plugin-tests-management-api runs on the build cluster (build01) with KUBECONFIG pointing to the provisioned test cluster. Multiple issues prevented the e2e tests from working:

  • The server used rest.InClusterConfig() connecting to the wrong cluster
  • Informers were killed after init due to a cancelled timeout context
  • No bearer token was available for HTTP requests (client-cert kubeconfig)
  • Shared PromQL expressions caused duplicate-spec rejections

Test plan

  • go build ./... passes
  • go test ./pkg/... ./internal/... passes
  • go vet ./... passes
  • CI e2e-management-api test passes

@sradco

sradco commented Jun 24, 2026

Copy link
Copy Markdown
Author

/retest

@simonpasquier

Copy link
Copy Markdown
Contributor

/test images

@PeterYurkovich

Copy link
Copy Markdown
Contributor

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Jun 24, 2026
@openshift-ci

openshift-ci Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: PeterYurkovich, sradco

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 24, 2026
@PeterYurkovich

Copy link
Copy Markdown
Contributor

/retest

1 similar comment
@zhuje

zhuje commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

/retest

@openshift-ci openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label Jul 1, 2026
@openshift-ci

openshift-ci Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

New changes are detected. LGTM label has been removed.

Comment thread test/e2e/framework/framework.go Outdated
if bearerToken == "" && config.BearerTokenFile != "" {
tokenBytes, err := os.ReadFile(config.BearerTokenFile)
if err != nil {
return nil, fmt.Errorf("failed to read bearer token file %q: %w", config.BearerTokenFile, err)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

err should already include the path

Suggested change
return nil, fmt.Errorf("failed to read bearer token file %q: %w", config.BearerTokenFile, err)
return nil, fmt.Errorf("failed to read bearer token file: %w", err)

@simonpasquier

Copy link
Copy Markdown
Contributor

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jul 1, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai

coderabbitai Bot commented Jul 1, 2026

Copy link
Copy Markdown

Walkthrough

Server kubeconfig loading was refactored into a new loadKubeConfig() helper preferring the KUBECONFIG environment variable with fallback to in-cluster config, wired into createHTTPServer with wrapped error handling. Separately, the e2e test framework's New() now falls back to reading BearerTokenFile when BearerToken is empty.

Changes

Kubeconfig Loading

Layer / File(s) Summary
loadKubeConfig helper and wiring
pkg/server.go
Adds clientcmd import, introduces loadKubeConfig() preferring KUBECONFIG via clientcmd.BuildConfigFromFlags with fallback to rest.InClusterConfig(), and updates createHTTPServer to use it, wrapping failures as cannot load kubernetes config: %w.

E2E Framework Bearer Token

Layer / File(s) Summary
BearerTokenFile fallback
test/e2e/framework/framework.go
New() resolves bearerToken from config.BearerToken, falling back to reading and trimming config.BearerTokenFile contents, wrapping read errors with the file path, and setting Framework.BearerToken accordingly.

Estimated code review effort: 2 (Simple) | ~10 minutes

Sequence Diagram(s)

sequenceDiagram
  participant createHTTPServer
  participant loadKubeConfig
  participant clientcmd
  participant rest

  createHTTPServer->>loadKubeConfig: request kube config
  loadKubeConfig->>clientcmd: BuildConfigFromFlags (if KUBECONFIG set)
  clientcmd-->>loadKubeConfig: config or error
  loadKubeConfig->>rest: InClusterConfig (fallback)
  rest-->>loadKubeConfig: config or error
  loadKubeConfig-->>createHTTPServer: config or wrapped error
Loading
🚥 Pre-merge checks | ✅ 14 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (14 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: preferring KUBECONFIG over in-cluster config for the Kubernetes client.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed No Ginkgo spec titles were added or changed; the touched files are server/framework code, and framework.go contains only helper methods.
Test Structure And Quality ✅ Passed No Ginkgo It blocks were changed; the PR only updates server config loading and an e2e helper, so the listed test-structure checks are not applicable.
Microshift Test Compatibility ✅ Passed The touched files only adjust kubeconfig/bearer-token loading and contain no new Ginkgo specs or MicroShift-unsafe OpenShift API usage.
Single Node Openshift (Sno) Test Compatibility ✅ Passed Touched files are server/helper code only; no new Ginkgo It/Describe/Context/When tests or SNO assumptions were added.
Topology-Aware Scheduling Compatibility ✅ Passed Only kubeconfig and bearer-token handling changed; no manifests, replicas, affinity, node selectors, or PDB logic were introduced.
Ote Binary Stdout Contract ✅ Passed Touched code only adds kubeconfig/token-file loading; no fmt.Print, klog, or stdout writes in process-level setup were introduced.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No new Ginkgo e2e specs were added; the changed files are server code and test framework only, with no IPv4-only or external connectivity test logic.
No-Weak-Crypto ✅ Passed No MD5/SHA1/DES/RC4/3DES/Blowfish/ECB, custom crypto, or secret-comparison code appears in the touched files; changes are kubeconfig/token handling only.
Container-Privileges ✅ Passed PR only changes Go source files; no container/K8s manifests were modified, so no privileged/host* settings were introduced.
No-Sensitive-Data-In-Logs ✅ Passed No new sensitive data is logged: the added server log is generic, and bearer-token/KUBECONFIG handling reads values without emitting them.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pkg/server.go (1)

399-414: 🎯 Functional Correctness | 🔵 Trivial | 💤 Low value

Consider using ClientConfigLoadingRules for standard KUBECONFIG semantics.

clientcmd.BuildConfigFromFlags only opens KUBECONFIG as a single file path. Standard kubeconfig tooling (kubectl, CI runners) allows KUBECONFIG to contain multiple OS-path-separated files that get merged, plus honors current-context. If CI sets a multi-path KUBECONFIG, this will fail to load where kubectl would succeed.

♻️ Suggested refactor using deferred loading rules
 func loadKubeConfig() (*rest.Config, error) {
-	if kubeconfig := os.Getenv("KUBECONFIG"); kubeconfig != "" {
-		cfg, err := clientcmd.BuildConfigFromFlags("", kubeconfig)
-		if err != nil {
-			return nil, fmt.Errorf("cannot build config from KUBECONFIG: %w", err)
-		}
-		return cfg, nil
-	}
+	if kubeconfig := os.Getenv("KUBECONFIG"); kubeconfig != "" {
+		loadingRules := clientcmd.NewDefaultClientConfigLoadingRules()
+		cfg, err := clientcmd.NewNonInteractiveDeferredLoadingClientConfig(
+			loadingRules, &clientcmd.ConfigOverrides{}).ClientConfig()
+		if err != nil {
+			return nil, fmt.Errorf("cannot build config from KUBECONFIG: %w", err)
+		}
+		return cfg, nil
+	}
 	cfg, err := rest.InClusterConfig()
 	if err != nil {
 		return nil, fmt.Errorf("cannot get in-cluster config: %w", err)
 	}
 	return cfg, nil
 }

Please verify whether the e2e CI environment ever sets a multi-path KUBECONFIG; if not, this is low priority.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/server.go` around lines 399 - 414, The kubeconfig loading in
loadKubeConfig currently uses clientcmd.BuildConfigFromFlags with KUBECONFIG,
which treats it as a single file and can miss standard merged kubeconfig
behavior. Update loadKubeConfig to use clientcmd loading rules (for example,
deferred loading with ClientConfigLoadingRules and ConfigOverrides) so multiple
OS-path-separated kubeconfig entries and current-context are honored, while
keeping the in-cluster rest.InClusterConfig fallback unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@pkg/server.go`:
- Around line 399-414: The kubeconfig loading in loadKubeConfig currently uses
clientcmd.BuildConfigFromFlags with KUBECONFIG, which treats it as a single file
and can miss standard merged kubeconfig behavior. Update loadKubeConfig to use
clientcmd loading rules (for example, deferred loading with
ClientConfigLoadingRules and ConfigOverrides) so multiple OS-path-separated
kubeconfig entries and current-context are honored, while keeping the in-cluster
rest.InClusterConfig fallback unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 99cb3663-2a8f-4186-ae67-72301f843620

📥 Commits

Reviewing files that changed from the base of the PR and between 78b2f35 and d1469bb.

📒 Files selected for processing (2)
  • pkg/server.go
  • test/e2e/framework/framework.go

@sradco sradco force-pushed the fix-e2e-kubeconfig-fallback branch from 75b2454 to fbe8ddc Compare July 2, 2026 09:47
and fix e2e test framework for CI

The server previously used rest.InClusterConfig()
exclusively, which connects to the build cluster
in CI rather than the provisioned test cluster
where PrometheusRule CRDs exist. This caused the
management API routes to fail to register,
resulting in 404 errors during e2e tests.

Introduce loadKubeConfig() that prefers KUBECONFIG
when set (CI/local dev) and falls back to
in-cluster config (production).

Additionally fix the e2e test framework and
server for CI:

- Pass the long-lived server context to the k8s
  client so informers keep running beyond init.
  The previous timeout context killed informers
  when createHTTPServer returned, freezing the
  relabeled-rules cache.

- Create a ServiceAccount token via the
  TokenRequest API when the kubeconfig uses
  client-certificate auth (no bearer token
  available). This is the case in CI where
  kubeadmin certs are used.

- Use unique PromQL expressions per test rule so
  the spec-equivalence check does not reject
  rules across tests as duplicates.

Signed-off-by: Shirly Radco <sradco@redhat.com>
Co-authored-by: AI Assistant <noreply@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@sradco sradco force-pushed the fix-e2e-kubeconfig-fallback branch from fbe8ddc to 52a3798 Compare July 2, 2026 09:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants