This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
The Multiarch Tuning Operator enhances operational experience within multi-architecture clusters and single-architecture clusters migrating to multi-architecture compute configurations. It provides architecture-aware scheduling of workloads by automatically adding node affinity requirements based on container image architectures.
This is a Kubernetes operator built with Kubebuilder and operator-sdk, primarily targeting OpenShift clusters.
# Local build (CGO required - needs gpgme-devel/libgpgme-dev)
make build
# Single-architecture image build
make docker-build IMG=<registry>/multiarch-tuning-operator:tag
# Multi-architecture image build (requires qemu-user-static)
make docker-buildx IMG=<registry>/multiarch-tuning-operator:tag
# Prevent buildx instance deletion (create empty file)
touch .persistent-buildx# Run all checks and tests (includes manifests, generate, fmt, vet, goimports, gosec, lint, and unit tests)
make test
# Unit tests only
make unit
# E2E tests (requires deployed operator)
KUBECONFIG=/path/to/kubeconfig NAMESPACE=openshift-multiarch-tuning-operator make e2e
# Individual checks
make lint # golangci-lint
make gosec # SAST security scanning
make vet # go vet
make goimports # goimports check
make fmt # gofmtRunning tests locally vs containerized:
- By default, tests run in a containerized environment using BUILD_IMAGE
- To run locally:
NO_DOCKER=1 make test - Or add
NO_DOCKER=1to a.envfile (see dotenv.example)
Running a single test:
# Set GINKGO_ARGS to target specific tests
GINKGO_ARGS="-v --focus='your test pattern'" make unitKUBECONFIG=/path/to/cluster/kubeconfig NAMESPACE=openshift-multiarch-tuning-operator make e2e
make manifests
make vendor
make verify-diff
### Deployment Commands
```shell
# Install CRDs
make install
# Deploy operator to cluster
make deploy IMG=<registry>/multiarch-tuning-operator:tag
# Create ClusterPodPlacementConfig to enable pod placement operand
kubectl create -f - <<EOF
apiVersion: multiarch.openshift.io/v1beta1
kind: ClusterPodPlacementConfig
metadata:
name: cluster
spec:
logVerbosityLevel: Normal
namespaceSelector:
matchExpressions:
- key: multiarch.openshift.io/exclude-pod-placement
operator: DoesNotExist
EOF
# Undeploy
make undeploy
# Uninstall CRDs
make uninstall
# Generate manifests (CRDs, RBAC, etc.) after editing API definitions
make manifests
# Generate DeepCopy implementations
make generate# Generate bundle manifests
make bundle VERSION=<version>
# Verify bundle generation is deterministic
make bundle-verify
# Build and push bundle image
make bundle-build BUNDLE_IMG=<registry>/multiarch-tuning-operator-bundle:<version>
make bundle-push BUNDLE_IMG=<registry>/multiarch-tuning-operator-bundle:<version>
# Build and push catalog image
make catalog-build CATALOG_IMG=<registry>/multiarch-tuning-operator-catalog:<version>
make catalog-push CATALOG_IMG=<registry>/multiarch-tuning-operator-catalog:<version>The operator runs in different modes controlled by flags (see bindFlags() function in cmd/main.go):
- Operator mode (
--enable-operator): Manages ClusterPodPlacementConfig CR and deploys operands - Pod Placement Controllers (
--enable-ppc-controllers): Reconciles pods with scheduling gates - Pod Placement Webhook (
--enable-ppc-webhook): Mutating webhook that adds scheduling gates to pods - ENoExecEvent Controllers (
--enable-enoexec-event-controllers): Monitors exec format errors via eBPF
Only one mode can be active at a time. Each mode has its own leader election ID.
Operator Controller (internal/controller/operator/):
- Reconciles ClusterPodPlacementConfig singleton CR (name must be "cluster")
- Deploys/manages pod placement operands (controllers, webhook, RBAC, etc.)
- Handles ordered deletion to ensure pods are ungated before operand removal
- Creates ServiceMonitor for metrics scraping
Pod Placement Operand (intrenal/controller/podplacement/):
- PodReconciler: Watches pods with scheduling gate, inspects container images, adds architecture-based nodeAffinity
- PodSchedulingGateMutatingWebHook: Adds
multiarch.openshift.io/scheduling-gateto new pods - GlobalPullSecretSyncer: Syncs pull secrets for image inspection
ENoExecEvent System (/internal/controller/enoexecevent/):
- Daemon (cmd/enoexec-daemon/): eBPF-based monitoring of exec format errors on nodes
- Handler Controller: Processes ENoExecEvent CRs created by the daemon
Pod Placement Flow:
- Webhook adds scheduling gate to pod, preventing scheduling
- PodReconciler watches gated pods
- Inspects container images to determine supported architectures
- If image inspection fails and
fallbackArchitectureis configured, uses the fallback architecture - Adds nodeAffinity requirement for kubernetes.io/arch
- Removes scheduling gate, allowing scheduler to place pod
Image Inspection (pkg/image/):
- Supports multi-arch manifests (OCI, Docker v2.2)
- Handles pull secrets and registry certificates
- Uses caching to optimize repeated inspections
- Metrics tracking for inspection operations
The operator binary runs in four mutually exclusive modes (controlled by flags in main.go):
-
Operator Mode (
--enable-operator): Manages the ClusterPodPlacementConfig CR lifecycle. Deploys and manages the pod placement operand components (controller and webhook deployments). -
Pod Placement Controller Mode (
--enable-ppc-controllers): Reconciles pods with the scheduling gate, inspects container images to determine supported architectures, sets nodeAffinity requirements, and removes the scheduling gate. -
Pod Placement Webhook Mode (
--enable-ppc-webhook): Mutating webhook that adds the scheduling gate to new pods (except in excluded namespaces). -
ENoExecEvent Controllers Mode (
--enable-enoexec-event-controllers): Monitors and handles exec format errors detected by the eBPF-based daemon running on cluster nodes.
Operator Controller (internal/controller/operator/clusterpodplacementconfig_controller.go):
- Reconciles ClusterPodPlacementConfig singleton resource (name must be "cluster")
- Manages deployment lifecycle of pod placement controller and webhook
- Updates status conditions (Available, Progressing, Degraded, Deprovisioning)
Pod Placement Controller (internal/controller/podplacement/pod_reconciler.go):
- Watches pods in Pending status with the
multiarch.openshift.io/scheduling-gatescheduling gate - High concurrency:
MaxConcurrentReconciles = NumCPU * 4(I/O bound image inspection) - Reconciliation flow:
- Check if pod has scheduling gate
- Verify pod should be processed (namespace selector, existing nodeAffinity)
- Retrieve image pull secrets
- Inspect container images to determine supported architectures
- Set required and preferred nodeAffinity for
kubernetes.io/arch - Remove scheduling gate to allow scheduling
- Max retries mechanism for image inspection failures
- Cache optimization in pod field selector: only watches
status.phase=Pending
Pod Model (internal/controller/podplacement/pod_model.go):
- Core logic for pod processing
- Image architecture inspection (supports registry authentication)
- NodeAffinity computation (required and preferred scheduling)
- Scheduling gate management
- Event publishing for audit trail
Mutating Webhook (internal/controller/podplacement/scheduling_gate_mutating_webhook.go):
- Adds
multiarch.openshift.io/scheduling-gateto new pods - Respects namespace selector from ClusterPodPlacementConfig
- Always excludes:
openshift-*,kube-*,hypershift-*namespaces - Uses worker pool for event publishing (ants library, 16 workers)
Image Inspector (pkg/image/inspector.go):
- Retrieves container image manifests from registries
- Determines supported CPU architectures
- Handles authentication via pull secrets and global pull secret
- Implements caching to reduce registry queries
CPPC Informer (pkg/informers/clusterpodplacementconfig/):
- Syncer that maintains in-memory singleton of ClusterPodPlacementConfig
- Enabled in operand modes (
--enable-cppc-informer) - Allows runtime log level changes and configuration access
- v1alpha1: Original API version with conversion webhook
- v1beta1: Current stable API (hub version for conversions)
- Conversion webhooks in api/v1beta1/clusterpodplacementconfig_webhook.go
ClusterPodPlacementConfig (api/v1beta1/clusterpodplacementconfig_types.go):
- Singleton resource (only name "cluster" allowed)
- Spec fields:
logVerbosity: Normal, Debug, Trace, TraceAllnamespaceSelector: Label selector for processed namespacesplugins: Optional plugins configuration (e.g., NodeAffinityScoring for preferred scheduling)fallbackArchitecture: Architecture to use when image inspector cannot determine image architecture (arm64, amd64, ppc64le, s390x, or empty string)
- Status conditions: Available, Progressing, Degraded, Deprovisioning, PodPlacementControllerNotRolledOut, PodPlacementWebhookNotRolledOut, MutatingWebhookConfigurationNotAvailable
System namespaces are excluded from pod placement by default (cannot be overridden):
openshift-*kube-*hypershift-*
Additional namespaces can be excluded via namespaceSelector with label multiarch.openshift.io/exclude-pod-placement.
NodeAffinityScoring Plugin (api/common/plugins/nodeaffinityscoring_plugin.go):
- Adds preferred (soft) nodeAffinity to influence scheduler scoring
- Weights architectures based on cluster node distribution
- Enables workload placement on preferred architectures while maintaining required constraints
api/
├── common/ # Shared constants and types
│ └── plugins/ # Plugin system (NodeAffinityScoring)
├── v1alpha1/ # Alpha API version with conversion
└── v1beta1/ # Beta API version (storage version)
internal/controller/
├── operator/ # Operator mode: ClusterPodPlacementConfig lifecycle
└── podplacement/ # Operand modes: pod reconciler and webhook
└── metrics/ # Prometheus metrics
pkg/
├── e2e/ # E2E test utilities
├── image/ # Container image inspection and authentication
├── informers/ # CPPC singleton informer
├── testing/ # Test helpers
└── utils/ # Shared utilities
config/ # Kustomize configurations
hack/ # Build and test scripts
Unit Tests:
- Located alongside source files (*_test.go)
- Use Ginkgo/Gomega framework
- envtest provides fake API server
- Test helpers in pkg/testing/builder/ (fluent builders for K8s objects)
- Test framework utilities in pkg/testing/framework/
E2E Tests:
- Located in pkg/e2e/
- Require deployed operator in cluster
- Separate test suites for operator and pod placement
Create a .env file in the repository root to customize build settings:
NO_DOCKER=1: Run builds/tests locally instead of in containerFORCE_DOCKER=1: Force Docker instead of PodmanBUILD_IMAGE: Override builder imageRUNTIME_IMAGE: Override runtime base image
Image inspection requires the containers/image library which has CGO dependencies on gpgme. This means:
- Local builds require gpgme-devel (RHEL/Fedora) or libgpgme-dev (Debian)
- CGO_ENABLED=1 is required
- Multi-arch builds use platform-specific builder images with these dependencies
This project uses Go vendoring (GOFLAGS=-mod=vendor). After modifying dependencies:
make vendor- All tests run in containers by default using
BUILD_IMAGE - Set
NO_DOCKER=1in.envfile or environment to run locally - Test artifacts output to
ARTIFACT_DIR(default:./_output) - Coverage reports generated in
test-unit-coverage.out
Create a .env file in the repository root for local settings:
NO_DOCKER=1 # Run tests and builds locally
FORCE_DOCKER=1 # Force Docker instead of Podman
BUILD_IMAGE=<custom-image> # Override builder image
RUNTIME_IMAGE=<custom-image> # Override runtime base image
- ClusterPodPlacementConfig must be named "cluster" (singleton enforced by webhook)
- Namespaces
openshift-*,kube-*, andhypershift-*are always excluded from pod placement - CGO is required for building (uses gpgme for registry authentication)
- Only one execution mode flag can be set at a time in main.go
- The operator uses vendored dependencies (
GOFLAGS=-mod=vendor)
All components expose Prometheus metrics at :8080/metrics:
Pod Placement Controller:
mto_ppo_ctrl_time_to_process_pod_seconds: Time to process any podmto_ppo_ctrl_time_to_process_gated_pod_seconds: Time to process gated pods (includes inspection)mto_ppo_ctrl_time_to_inspect_image_seconds: Image inspection timemto_ppo_ctrl_processed_pods_total: Total gated pods processedmto_ppo_ctrl_failed_image_inspection_total: Failed image inspections
Mutating Webhook:
mto_ppo_wh_pods_processed_total: Total pods processedmto_ppo_wh_pods_gated_total: Total pods gatedmto_ppo_wh_response_time_seconds: Webhook response time
Shared:
mto_ppo_pods_gated: Current number of gated pods
See docs/metrics.md for example queries and monitoring setup.
- OpenShift Enhancement Proposal
- KEP-3521: Pod Scheduling Readiness
- KEP-3838: Pod Mutable Scheduling Directives
- Kubebuilder Documentation
The repository includes .tekton/ pipeline definitions for Konflux CI/CD. Bundle Dockerfiles have konflux-specific variants (bundle.konflux.Dockerfile).