CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

This is a Kubernetes operator for RustFS, written in Rust using the kube-rs library. The operator manages a custom resource Tenant (CRD) that provisions and manages RustFS storage clusters in Kubernetes.

Current Status: v0.1.0 (pre-release) - Early development, not yet production-ready Test Coverage: 47 library unit tests, all passing ✅ (run cargo test --all for current count)

Build and Development Commands

Building

cargo build              # Debug build
cargo build --release    # Release build

Testing

cargo test               # Run all tests
cargo test -- --ignored  # Run ignored tests (includes TLS tests)

Running the Operator

# Generate CRD YAML to stdout
cargo run -- crd

# Generate CRD YAML to file
cargo run -- crd -f tenant-crd.yaml

# Run the Kubernetes controller (requires cluster access)
cargo run -- server

# Run the operator HTTP console API (default port 9090; used by console-web)
cargo run -- console

Docker

# Build the Docker image
docker build -t operator .

Note: The Dockerfile uses a multi-stage build (rust:bookworm, cargo-chef); the final image defaults to debian:bookworm-slim.

Scripts (deploy / cleanup / check)

Shell scripts are under scripts/ and grouped by purpose. Run from project root (scripts will cd to project root automatically):

Deploy: scripts/deploy/deploy-rustfs.sh, scripts/deploy/deploy-rustfs-4node.sh
Cleanup: scripts/cleanup/cleanup-rustfs.sh, scripts/cleanup/cleanup-rustfs-4node.sh
Check: scripts/check/check-rustfs.sh
Test script syntax: scripts/test/script-test.sh
Kind 4-node config: deploy/kind/kind-rustfs-cluster.yaml See scripts/README.md for details.

Architecture Overview

Critical Architectural Understanding

⚠️ IMPORTANT: RustFS Unified Cluster Architecture

All pools within a single Tenant form ONE unified RustFS erasure-coded cluster:

Unified Cluster: Multiple pools do NOT create separate clusters; they create one combined cluster
Uniform Data Distribution: Erasure coding stripes data across ALL volumes in ALL pools equally
No Storage Class Awareness: RustFS does not intelligently place data based on storage performance
Performance Limitation: The entire cluster performs at the speed of the SLOWEST storage class
External Tiering: RustFS tiering uses lifecycle policies to external cloud storage (S3, Azure, GCS), NOT pool-based tiers

Valid Multi-Pool Use Cases:

✅ Cluster capacity expansion and gradual hardware migration
✅ Geographic distribution for compliance and disaster recovery
✅ Spot vs on-demand instance optimization (compute cost savings, not storage)
✅ Same storage class with different disk sizes
✅ Resource differentiation (CPU/memory) per pool
✅ Topology-aware distribution across failure domains

Invalid Multi-Pool Use Cases:

❌ Storage class mixing for performance tiering (NVMe for hot, HDD for cold)
❌ Automatic intelligent data placement based on access patterns

For separate RustFS clusters, create separate Tenants, NOT multiple pools.

See docs/architecture-decisions.md for detailed ADRs.

Reconciliation Loop

The operator follows the standard Kubernetes controller pattern:

Entry Point: src/main.rs - CLI subcommands: crd, server (controller), console (management API)
Controller: src/lib.rs:run() - Sets up the controller that watches Tenant resources and owned resources (ConfigMaps, Secrets, ServiceAccounts, Pods, StatefulSets)
Reconciliation Logic: src/reconcile.rs:reconcile_rustfs() - Main reconciliation function that creates/updates Kubernetes resources for a Tenant
Error Handling: src/reconcile.rs:error_policy() - Intelligent retry intervals based on error type:
- Credential validation errors (user-fixable): 60-second requeue (reduces spam)
- Transient API errors: 5-second requeue (fast recovery)
- Other validation errors: 15-second requeue

Custom Resource Definition (CRD)

Tenant CRD: src/types/v1alpha1/tenant.rs - Defines the Tenant custom resource with spec and status
- API Group: rustfs.com/v1alpha1
- Primary spec fields: pools, image, env, scheduler, configuration, image_pull_policy, pod_management_policy
- Each Tenant manages one or more Pools that form a unified cluster
Pool Spec: src/types/v1alpha1/pool.rs - Defines a pool with name, servers, persistence, and scheduling
- Validation Rules:
  - Pool name must not be empty
  - 2-server pools: must have at least 4 total volumes (servers * volumesPerServer >= 4)
  - 3-server pools: must have at least 6 total volumes
  - General: servers * volumesPerServer >= 4
- SchedulingConfig: Per-pool scheduling (nodeSelector, affinity, tolerations, resources, topologySpreadConstraints, priorityClassName)
- Uses #[serde(flatten)] to maintain flat YAML structure while grouping scheduling fields in code
Persistence Config: src/types/v1alpha1/persistence.rs
- volumes_per_server: Number of volumes per server (must be > 0)
- volume_claim_template: Optional PVC spec (defaults to ReadWriteOnce, 10Gi)
- path: Optional custom volume mount path (default: /data/rustfs{N})
- labels, annotations: Optional metadata for PVCs

RustFS-Specific Constants and Standards

Service Ports (verified against RustFS source code):

IO Service (S3 API): Port 9000 (not 90)
Console UI: Port 9001 (not 9090)

Volume Paths (matches RustFS Helm chart and docker-compose):

Mount path pattern: /data/rustfs{0...N} (not /data/{N})
Uses 3-dot ellipsis notation for RustFS expansion

Required Environment Variables (automatically set by operator):

RUSTFS_VOLUMES - Combined volumes from all pools (space-separated)
RUSTFS_ADDRESS - Server binding address (0.0.0.0:9000)
RUSTFS_CONSOLE_ADDRESS - Console binding address (0.0.0.0:9001)
RUSTFS_CONSOLE_ENABLE - Enable console UI (true)

Credentials (optional - from Secrets or environment variables):

Recommended: Use a Secret referenced via spec.credsSecret.name (see examples/secret-credentials-tenant.yaml)
Alternative: Provide via environment variables in spec.env (e.g., RUSTFS_ACCESS_KEY, RUSTFS_SECRET_KEY)
If neither provided: RustFS will use built-in defaults (rustfsadmin / rustfsadmin) - acceptable for development, change for production
Secret must contain: accesskey and secretkey keys (both required, valid UTF-8, minimum 8 characters)
Priority: Secret credentials > Environment variables > RustFS defaults
Validation: Only performed when Secret is configured
- Secret exists in same namespace
- Has both required keys
- Keys are valid UTF-8
- Keys are at least 8 characters long

Context and API Wrapper

Context: src/context.rs - Wraps the Kubernetes client and provides helper methods for CRUD operations
- apply() - Server-side apply for declarative resource management
- get(), create(), delete(), list() - Standard CRUD operations
- update_status() - Updates Tenant status with retry logic for conflicts
- record() - Publishes Kubernetes events for reconciliation actions
- validate_credential_secret() - Validates credential Secret structure (when configured)
  - ✅ Validates Secret exists and has required keys (accesskey, secretkey)
  - ✅ Validates keys contain valid UTF-8 data
  - ✅ Validates minimum 8 characters for both keys
  - Does NOT extract credential values (for security)
  - Actual credential injection handled by Kubernetes via secretKeyRef
  - Returns comprehensive error messages for debugging

Resource Creation

The Tenant type in src/types/v1alpha1/tenant.rs has factory methods for creating Kubernetes resources:

RBAC: new_role(), new_service_account(), new_role_binding()
Services: new_io_service(), new_console_service(), new_headless_service()
Workloads: new_statefulset() - Creates one StatefulSet per pool
Helper Methods: Extracted to src/types/v1alpha1/tenant/helper.rs for better organization
All created resources include proper owner references for garbage collection

Status Management

Status Types: src/types/v1alpha1/status/ - Status structures including state, pool status, and certificate status
The status is updated via the Kubernetes status subresource (Context::update_status, with a single retry on conflict)
Implemented (successful reconcile path): Aggregates per-pool StatefulSet status, sets Ready / Progressing / Degraded conditions, overall current_state, and pool entries—see reconcile_rustfs() in src/reconcile.rs
Remaining (Issue #42 follow-up): When reconcile returns early with Err (e.g. credential/KMS validation, immutable field violations), status is not always updated to reflect that failure; consider setting conditions or state before returning

Utilities

TLS Utilities: src/utils/tls.rs - X.509 certificate and private key validation
- Supports RSA, ECDSA (P-256), and Ed25519 key types
- Supports PKCS#1, PKCS#8, and SEC1 formats
- Validates that private keys match certificate public keys

Test Infrastructure

Test Module: src/tests.rs - Centralized test helpers
- create_test_tenant() - Helper function for consistent test tenant creation
- Used across test suites for better maintainability

Code Structure Notes

Uses kube and k8s-openapi from crates.io (see Cargo.toml for versions)
Kubernetes version target: v1.30
Error handling uses the snafu crate for structured error types
All files include Apache 2.0 license headers
Uses strum::Display for enum-to-string conversions (ImagePullPolicy, PodManagementPolicy, PoolState, State)

Important Dependencies

kube / k8s-openapi: Versions pinned in Cargo.toml (crates.io)
Uses Rust edition 2024
Build script (build.rs) generates build metadata using the shadow-rs crate

Known Issues and TODOs

High Priority

~~Secret-based credential management~~ ✅ COMPLETED (2025-11-15, Issue #41)
~~Tenant status conditions on successful reconcile~~ — Ready / Progressing / Degraded, pool-level status (see reconcile.rs)
Status on reconciliation failures — Early error returns may not patch Tenant status (Issue #42 follow-up)
StatefulSet advanced rollout — Safe updates and validation exist; rollback, richer strategies, and Issue #43 polish remain
Integration tests — Only unit tests in-repo today

Medium Priority

Status subresource update retry beyond single conflict retry (context.rs)
TLS certificate rotation automation
Configuration validation enhancements (storage class existence, node selector validity)

Documentation Structure

CHANGELOG.md - All notable changes following Keep a Changelog format
ROADMAP.md - Development roadmap organized by focus areas (Core Stability, Advanced Features, Enterprise Features, Production Ready)
docs/architecture-decisions.md - ADRs documenting key architectural decisions
docs/multi-pool-use-cases.md - Comprehensive guide for multi-pool scenarios
docs/DEVELOPMENT-NOTES.md - Historical analysis and design notes (not the primary dev guide; see docs/DEVELOPMENT.md and CONTRIBUTING.md)

Examples

Located in examples/ directory (moved from deploy/rustfs-operator/examples/):

Production Examples:

production-ha-tenant.yaml - Production HA with topology spread constraints
cluster-expansion-tenant.yaml - Capacity expansion and hardware migration
geographic-pools-tenant.yaml - Multi-region deployment

Development Examples:

simple-tenant.yaml - Simple single-pool tenant with documentation
minimal-dev-tenant.yaml - Minimal development configuration
multi-pool-tenant.yaml - Basic multi-pool example

Advanced Scenarios:

spot-instance-tenant.yaml - Cost optimization using spot instances
hardware-pools-tenant.yaml - Heterogeneous disk sizes (same storage class)
custom-rbac-tenant.yaml - Custom RBAC configuration

All examples include:

Inline documentation explaining configuration choices
Architectural warnings about RustFS unified cluster behavior
kubectl verification commands

See examples/README.md for comprehensive usage guide.

Development Priorities (from ROADMAP.md)

Core Stability (Highest Priority)

Status on failed reconcile paths and stronger status retry
StatefulSet rollout polish (rollback, strategies) and observability (metrics)
Integration test suite

Advanced Features

Tenant lifecycle management with finalizers
Pool lifecycle management (add/remove/scale)
TLS/certificate automation (cert-manager integration)
Monitoring and alerting (Prometheus, Grafana)

Enterprise Features

Multi-tenancy enhancements
Security hardening (Pod Security Standards)
Compliance and audit logging
Advanced networking and storage enhancements

Production Ready (Long-term Goals)

95%+ test coverage
Complete API documentation
Ecosystem integration (OperatorHub, Helm, OLM)
Community and support channels

Verification Standards

All RustFS-specific constants and behaviors should be verified against:

RustFS source code (~/git/rustfs)
RustFS Helm chart (helm/rustfs/)
RustFS docker-compose examples
RustFS MNMD deployment guide
RustFS configuration constants

Do not invent or assume RustFS features - always verify against official sources.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLAUDE.md

Project Overview

Build and Development Commands

Building

Testing

Running the Operator

Docker

Scripts (deploy / cleanup / check)

Architecture Overview

Critical Architectural Understanding

Reconciliation Loop

Custom Resource Definition (CRD)

RustFS-Specific Constants and Standards

Context and API Wrapper

Resource Creation

Status Management

Utilities

Test Infrastructure

Code Structure Notes

Important Dependencies

Known Issues and TODOs

High Priority

Medium Priority

Documentation Structure

Examples

Development Priorities (from ROADMAP.md)

Core Stability (Highest Priority)

Advanced Features

Enterprise Features

Production Ready (Long-term Goals)

Verification Standards

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

CLAUDE.md

Project Overview

Build and Development Commands

Building

Testing

Running the Operator

Docker

Scripts (deploy / cleanup / check)

Architecture Overview

Critical Architectural Understanding

Reconciliation Loop

Custom Resource Definition (CRD)

RustFS-Specific Constants and Standards

Context and API Wrapper

Resource Creation

Status Management

Utilities

Test Infrastructure

Code Structure Notes

Important Dependencies

Known Issues and TODOs

High Priority

Medium Priority

Documentation Structure

Examples

Development Priorities (from ROADMAP.md)

Core Stability (Highest Priority)

Advanced Features

Enterprise Features

Production Ready (Long-term Goals)

Verification Standards