This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
This is a Kubernetes operator for RustFS, written in Rust using the kube-rs library. The operator manages a custom resource Tenant (CRD) that provisions and manages RustFS storage clusters in Kubernetes.
Current Status: v0.1.0 (pre-release) - Early development, not yet production-ready
Test Coverage: 47 library unit tests, all passing ✅ (run cargo test --all for current count)
cargo build # Debug build
cargo build --release # Release buildcargo test # Run all tests
cargo test -- --ignored # Run ignored tests (includes TLS tests)# Generate CRD YAML to stdout
cargo run -- crd
# Generate CRD YAML to file
cargo run -- crd -f tenant-crd.yaml
# Run the Kubernetes controller (requires cluster access)
cargo run -- server
# Run the operator HTTP console API (default port 9090; used by console-web)
cargo run -- console# Build the Docker image
docker build -t operator .Note: The Dockerfile uses a multi-stage build (rust:bookworm, cargo-chef); the final image defaults to debian:bookworm-slim.
Shell scripts are under scripts/ and grouped by purpose. Run from project root (scripts will cd to project root automatically):
- Deploy:
scripts/deploy/deploy-rustfs.sh,scripts/deploy/deploy-rustfs-4node.sh - Cleanup:
scripts/cleanup/cleanup-rustfs.sh,scripts/cleanup/cleanup-rustfs-4node.sh - Check:
scripts/check/check-rustfs.sh - Test script syntax:
scripts/test/script-test.sh - Kind 4-node config:
deploy/kind/kind-rustfs-cluster.yamlSeescripts/README.mdfor details.
All pools within a single Tenant form ONE unified RustFS erasure-coded cluster:
- Unified Cluster: Multiple pools do NOT create separate clusters; they create one combined cluster
- Uniform Data Distribution: Erasure coding stripes data across ALL volumes in ALL pools equally
- No Storage Class Awareness: RustFS does not intelligently place data based on storage performance
- Performance Limitation: The entire cluster performs at the speed of the SLOWEST storage class
- External Tiering: RustFS tiering uses lifecycle policies to external cloud storage (S3, Azure, GCS), NOT pool-based tiers
Valid Multi-Pool Use Cases:
- ✅ Cluster capacity expansion and gradual hardware migration
- ✅ Geographic distribution for compliance and disaster recovery
- ✅ Spot vs on-demand instance optimization (compute cost savings, not storage)
- ✅ Same storage class with different disk sizes
- ✅ Resource differentiation (CPU/memory) per pool
- ✅ Topology-aware distribution across failure domains
Invalid Multi-Pool Use Cases:
- ❌ Storage class mixing for performance tiering (NVMe for hot, HDD for cold)
- ❌ Automatic intelligent data placement based on access patterns
For separate RustFS clusters, create separate Tenants, NOT multiple pools.
See docs/architecture-decisions.md for detailed ADRs.
The operator follows the standard Kubernetes controller pattern:
- Entry Point:
src/main.rs- CLI subcommands:crd,server(controller),console(management API) - Controller:
src/lib.rs:run()- Sets up the controller that watchesTenantresources and owned resources (ConfigMaps, Secrets, ServiceAccounts, Pods, StatefulSets) - Reconciliation Logic:
src/reconcile.rs:reconcile_rustfs()- Main reconciliation function that creates/updates Kubernetes resources for a Tenant - Error Handling:
src/reconcile.rs:error_policy()- Intelligent retry intervals based on error type:- Credential validation errors (user-fixable): 60-second requeue (reduces spam)
- Transient API errors: 5-second requeue (fast recovery)
- Other validation errors: 15-second requeue
-
Tenant CRD:
src/types/v1alpha1/tenant.rs- Defines theTenantcustom resource with spec and status- API Group:
rustfs.com/v1alpha1 - Primary spec fields:
pools,image,env,scheduler,configuration,image_pull_policy,pod_management_policy - Each Tenant manages one or more Pools that form a unified cluster
- API Group:
-
Pool Spec:
src/types/v1alpha1/pool.rs- Defines a pool withname,servers,persistence, andscheduling- Validation Rules:
- Pool name must not be empty
- 2-server pools: must have at least 4 total volumes (
servers * volumesPerServer >= 4) - 3-server pools: must have at least 6 total volumes
- General:
servers * volumesPerServer >= 4
- SchedulingConfig: Per-pool scheduling (nodeSelector, affinity, tolerations, resources, topologySpreadConstraints, priorityClassName)
- Uses
#[serde(flatten)]to maintain flat YAML structure while grouping scheduling fields in code
- Validation Rules:
-
Persistence Config:
src/types/v1alpha1/persistence.rsvolumes_per_server: Number of volumes per server (must be > 0)volume_claim_template: Optional PVC spec (defaults to ReadWriteOnce, 10Gi)path: Optional custom volume mount path (default:/data/rustfs{N})labels,annotations: Optional metadata for PVCs
Service Ports (verified against RustFS source code):
- IO Service (S3 API): Port
9000(not 90) - Console UI: Port
9001(not 9090)
Volume Paths (matches RustFS Helm chart and docker-compose):
- Mount path pattern:
/data/rustfs{0...N}(not/data/{N}) - Uses 3-dot ellipsis notation for RustFS expansion
Required Environment Variables (automatically set by operator):
RUSTFS_VOLUMES- Combined volumes from all pools (space-separated)RUSTFS_ADDRESS- Server binding address (0.0.0.0:9000)RUSTFS_CONSOLE_ADDRESS- Console binding address (0.0.0.0:9001)RUSTFS_CONSOLE_ENABLE- Enable console UI (true)
Credentials (optional - from Secrets or environment variables):
- Recommended: Use a Secret referenced via
spec.credsSecret.name(seeexamples/secret-credentials-tenant.yaml) - Alternative: Provide via environment variables in
spec.env(e.g.,RUSTFS_ACCESS_KEY,RUSTFS_SECRET_KEY) - If neither provided: RustFS will use built-in defaults (
rustfsadmin/rustfsadmin) - acceptable for development, change for production - Secret must contain:
accesskeyandsecretkeykeys (both required, valid UTF-8, minimum 8 characters) - Priority: Secret credentials > Environment variables > RustFS defaults
- Validation: Only performed when Secret is configured
- Secret exists in same namespace
- Has both required keys
- Keys are valid UTF-8
- Keys are at least 8 characters long
- Context:
src/context.rs- Wraps the Kubernetes client and provides helper methods for CRUD operationsapply()- Server-side apply for declarative resource managementget(),create(),delete(),list()- Standard CRUD operationsupdate_status()- Updates Tenant status with retry logic for conflictsrecord()- Publishes Kubernetes events for reconciliation actionsvalidate_credential_secret()- Validates credential Secret structure (when configured)- ✅ Validates Secret exists and has required keys (
accesskey,secretkey) - ✅ Validates keys contain valid UTF-8 data
- ✅ Validates minimum 8 characters for both keys
- Does NOT extract credential values (for security)
- Actual credential injection handled by Kubernetes via
secretKeyRef - Returns comprehensive error messages for debugging
- ✅ Validates Secret exists and has required keys (
The Tenant type in src/types/v1alpha1/tenant.rs has factory methods for creating Kubernetes resources:
- RBAC:
new_role(),new_service_account(),new_role_binding() - Services:
new_io_service(),new_console_service(),new_headless_service() - Workloads:
new_statefulset()- Creates one StatefulSet per pool - Helper Methods: Extracted to
src/types/v1alpha1/tenant/helper.rsfor better organization - All created resources include proper owner references for garbage collection
- Status Types:
src/types/v1alpha1/status/- Status structures including state, pool status, and certificate status - The status is updated via the Kubernetes status subresource (
Context::update_status, with a single retry on conflict) - Implemented (successful reconcile path): Aggregates per-pool StatefulSet status, sets
Ready/Progressing/Degradedconditions, overallcurrent_state, and pool entries—seereconcile_rustfs()insrc/reconcile.rs - Remaining (Issue #42 follow-up): When reconcile returns early with
Err(e.g. credential/KMS validation, immutable field violations), status is not always updated to reflect that failure; consider setting conditions or state before returning
- TLS Utilities:
src/utils/tls.rs- X.509 certificate and private key validation- Supports RSA, ECDSA (P-256), and Ed25519 key types
- Supports PKCS#1, PKCS#8, and SEC1 formats
- Validates that private keys match certificate public keys
- Test Module:
src/tests.rs- Centralized test helperscreate_test_tenant()- Helper function for consistent test tenant creation- Used across test suites for better maintainability
- Uses
kubeandk8s-openapifrom crates.io (seeCargo.tomlfor versions) - Kubernetes version target: v1.30
- Error handling uses the
snafucrate for structured error types - All files include Apache 2.0 license headers
- Uses
strum::Displayfor enum-to-string conversions (ImagePullPolicy,PodManagementPolicy,PoolState,State)
- kube / k8s-openapi: Versions pinned in
Cargo.toml(crates.io) - Uses Rust edition 2024
- Build script (
build.rs) generates build metadata using theshadow-rscrate
-
Secret-based credential management✅ COMPLETED (2025-11-15, Issue #41) -
Tenant status conditions on successful reconcile—Ready/Progressing/Degraded, pool-level status (seereconcile.rs) - Status on reconciliation failures — Early error returns may not patch Tenant status (Issue #42 follow-up)
- StatefulSet advanced rollout — Safe updates and validation exist; rollback, richer strategies, and Issue #43 polish remain
- Integration tests — Only unit tests in-repo today
- Status subresource update retry beyond single conflict retry (
context.rs) - TLS certificate rotation automation
- Configuration validation enhancements (storage class existence, node selector validity)
- CHANGELOG.md - All notable changes following Keep a Changelog format
- ROADMAP.md - Development roadmap organized by focus areas (Core Stability, Advanced Features, Enterprise Features, Production Ready)
- docs/architecture-decisions.md - ADRs documenting key architectural decisions
- docs/multi-pool-use-cases.md - Comprehensive guide for multi-pool scenarios
- docs/DEVELOPMENT-NOTES.md - Historical analysis and design notes (not the primary dev guide; see
docs/DEVELOPMENT.mdandCONTRIBUTING.md)
Located in examples/ directory (moved from deploy/rustfs-operator/examples/):
Production Examples:
production-ha-tenant.yaml- Production HA with topology spread constraintscluster-expansion-tenant.yaml- Capacity expansion and hardware migrationgeographic-pools-tenant.yaml- Multi-region deployment
Development Examples:
simple-tenant.yaml- Simple single-pool tenant with documentationminimal-dev-tenant.yaml- Minimal development configurationmulti-pool-tenant.yaml- Basic multi-pool example
Advanced Scenarios:
spot-instance-tenant.yaml- Cost optimization using spot instanceshardware-pools-tenant.yaml- Heterogeneous disk sizes (same storage class)custom-rbac-tenant.yaml- Custom RBAC configuration
All examples include:
- Inline documentation explaining configuration choices
- Architectural warnings about RustFS unified cluster behavior
- kubectl verification commands
See examples/README.md for comprehensive usage guide.
- Status on failed reconcile paths and stronger status retry
- StatefulSet rollout polish (rollback, strategies) and observability (metrics)
- Integration test suite
- Tenant lifecycle management with finalizers
- Pool lifecycle management (add/remove/scale)
- TLS/certificate automation (cert-manager integration)
- Monitoring and alerting (Prometheus, Grafana)
- Multi-tenancy enhancements
- Security hardening (Pod Security Standards)
- Compliance and audit logging
- Advanced networking and storage enhancements
- 95%+ test coverage
- Complete API documentation
- Ecosystem integration (OperatorHub, Helm, OLM)
- Community and support channels
All RustFS-specific constants and behaviors should be verified against:
- RustFS source code (
~/git/rustfs) - RustFS Helm chart (
helm/rustfs/) - RustFS docker-compose examples
- RustFS MNMD deployment guide
- RustFS configuration constants
Do not invent or assume RustFS features - always verify against official sources.