Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
248 changes: 243 additions & 5 deletions docs/en/global/install.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -564,13 +564,14 @@ Use the DCS resource fields from [Creating Clusters on Huawei DCS](../create-clu
- Add `Cluster.metadata.labels.is-global: "true"` and `Cluster.metadata.labels.cluster-type: DCS`.
- Add `Cluster.metadata.annotations["cpaas.io/registry-address"]` with `${NODE_REGISTRY_ADDRESS}`.
- Set `KubeadmControlPlane.spec.kubeadmConfigSpec.format: ignition` for Alauda OS.
- Keep the release manifest's non-encryption kubeadm files, kubelet patches, audit policy, and installer RBAC entries.
- Keep the `KubeadmControlPlane.spec.kubeadmConfigSpec.users` entry with a non-empty `sshAuthorizedKeys` list (the `boot` user). The DCS `ignition` format rejects an empty SSH key list, so this field is required even for a `global` cluster you do not plan to access over SSH. See [Resolving Placeholder Values](../create-cluster/huawei-dcs.mdx#resolving-placeholders) for what to supply when no interactive key is needed.
- Keep the non-encryption kubeadm files, kubelet patches, audit policy, and installer RBAC entries. The file contents (the PodSecurity admission config, the kubelet patch, and the audit policy), together with the full `clusterConfiguration`, `preKubeadmCommands`, `postKubeadmCommands`, and the init and join node-registration patches, are identical to a workload cluster. Copy them from the [Complete KubeadmControlPlane Configuration](../create-cluster/huawei-dcs.mdx#complete-kubeadmcontrolplane-configuration) appendix, or reference the `dcs-kubernetes-<major.minor>-files` Secret documented there. The wiring fragment below shows only the `global`-specific fields layered on top of that base.
- For a normal non-DR deployment, do not set `DCSCluster.spec.encryptionProviderConfigRef` and do not add `/etc/kubernetes/encryption-provider.conf` to `KubeadmControlPlane.spec.kubeadmConfigSpec.files`.
- Keep `/var/cpaas` as platform state. If you need the disk to survive rolling replacement, declare it in `DCSIpHostnamePool.spec.pool[].persistentDisk`; do not rely on `DCSMachineTemplate` template disks as preserved state.
- Use concrete `datastoreName` values for DCS local storage unless you have verified that the selected datastore cluster can place volumes on hosts that can run the target VM.

<Directive type="warning" title="Fragment Scope">
The following YAML is a differential fragment, not a complete manifest that you can apply directly. Merge these `global`-specific changes into the manifest that you prepare from the DCS create-cluster references, then apply the complete manifest file.
The following YAML is a differential fragment, not a complete manifest that you can apply directly. Merge these `global`-specific changes into the manifest that you prepare from the DCS create-cluster references, then apply the complete manifest file. If you would rather start from a complete file, adapt the [Worked Example: Complete `global` Manifest for Huawei DCS](#dcs-worked-example) at the end of this page instead of assembling it from fragments.
</Directive>

The following fragment shows the `global`-specific Cluster API wiring. Fill the provider resource fields by using the DCS create-cluster references above.
Expand Down Expand Up @@ -625,7 +626,7 @@ spec:
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: DCSMachineTemplate
name: global-master-template
name: global-cp-template
kubeadmConfigSpec:
format: ignition
clusterConfiguration:
Expand Down Expand Up @@ -726,7 +727,7 @@ spec:
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereMachineTemplate
name: global-master-machine-template
name: global-cp-machine-template
kubeadmConfigSpec:
format: cloud-init
clusterConfiguration:
Expand Down Expand Up @@ -823,7 +824,7 @@ spec:
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: HCSMachineTemplate
name: global-master-machine-template
name: global-cp-machine-template
kubeadmConfigSpec:
format: cloud-init
clusterConfiguration:
Expand Down Expand Up @@ -1211,6 +1212,7 @@ kubectl --kubeconfig <global-kubeconfig> get clustermodule global
|---|---|---|
| Machines stay in `Pending` or do not appear | `kubectl describe machine -n cpaas-system <machine>` | The provider-specific failure reason on the machine `Bootstrap` and `Infrastructure` conditions. IaaS quota, network, and credential issues surface here. |
| `KubeadmControlPlane` does not reach `Ready` | `kubectl get nodes` with the new cluster kubeconfig and `kubectl describe kubeadmcontrolplane -n cpaas-system` | etcd health on the first control plane node and join progress for the remaining nodes. |
| `KubeadmControlPlane` stays not `Ready` (often `EtcdClusterHealthy=Unknown`) and the installer hangs, even though the control-plane VMs are up, and the bootstrap (KIND) Pods cannot reach the control-plane subnet | NAT rules on the bootstrap (KIND) host | Stopping the host firewall after KIND started can flush the KIND bridge's SNAT masquerade, so the CAPI controller Pods running in KIND cannot route to the new control-plane subnet. Re-add it: `iptables -t nat -A POSTROUTING -s 172.18.0.0/16 ! -d 172.18.0.0/16 -j MASQUERADE` (`172.18.0.0/16` is the default KIND subnet). |
| Pods in `kube-system` stay `Pending` or fail to pull images | `kubectl --kubeconfig <global-kubeconfig> describe pod -n kube-system <pod>` | Image pull errors usually mean the node-facing registry address is not reachable from the new cluster's subnet. |
| Installer progress API shows a stalled stage | `/var/cpaas/data/installer.log` | The most recent phase line and the most recent error message. Retried errors repeat on a short interval; persistent errors do not advance. |
| `ClusterModule/global` does not reach a healthy phase | `kubectl --kubeconfig <global-kubeconfig> describe clustermodule global` | The `Status.conditions` describe which module is blocking the cluster from completing. |
Expand Down Expand Up @@ -1450,9 +1452,245 @@ The installation is successful when all of the following conditions are true:
- Critical Pods in `cpaas-system` are `Running` or `Completed`.
- `ClusterModule/global` reports the base module as healthy.

## Decommission the Bootstrap Host \{#decommission-bootstrap-host}

After the installer reports success, the `global` cluster runs its own Cluster API providers and no longer depends on the temporary `minialauda` KIND cluster on the bootstrap host. Once verification passes, decommission the bootstrap host by removing only the local `minialauda` KIND cluster and its container network on that host.

<Directive type="info" title="Migrate the DCS credential Secret to the global cluster first">
The installer migrates the DCS credential Secret to the `global` cluster automatically only when it is named `ait-credential-secret` (the name used in the [worked example](#dcs-worked-example)). If your `DCSCluster.spec.credentialSecretRef` points to a Secret with a different name, it is not carried over — copy that Secret into the `global` cluster's `cpaas-system` namespace before you remove the bootstrap host. Without it, the `global` cluster's DCS provider has no DCS API credentials and cannot reconcile; for example, scaling the cluster out later fails. Verify with `kubectl --kubeconfig <global-kubeconfig> get secret <name> -n cpaas-system`.
</Directive>

<Directive type="warning" title="Do not delete the Cluster API objects to clean up">
Do not run `kubectl delete cluster global`, and do not delete the `Cluster`, `KubeadmControlPlane`, or provider infrastructure objects as a cleanup step. After installation these objects own the live `global` control plane machines, so deleting them cascades into deleting the control plane VMs and destroys the cluster you just installed. Decommissioning is limited to removing the local KIND cluster on the bootstrap host; leave the Cluster API objects in place.
</Directive>

## Next Steps

- Install additional provider plugins on the new `global` cluster: see [Installation](../install/index.mdx).
- Configure infrastructure resources for workload clusters: see [Infrastructure Resources](../infrastructure/index.mdx).
- Create your first workload cluster: see [Creating Clusters](../create-cluster/index.mdx).
- Plan disaster recovery: see [Global Cluster Disaster Recovery](./disaster_recovery.mdx).

## Worked Example: Complete `global` Manifest for Huawei DCS \{#dcs-worked-example}

This is a complete, single-file manifest for a three-replica control-plane `global` cluster on Huawei DCS. It is the same set of resources described in Step 4, already assembled so you do not have to merge fragments across pages. It uses documentation-only example values: replace every `<placeholder>`, and reuse the `${...}` variables you exported in Step 1. Apply it in Step 5.

This example targets a non-DR cluster. The three large kubeadm files (the PodSecurity admission config, the kubelet patch, and the audit policy) are identical to a workload cluster, so they are pulled from the `dcs-kubernetes-<major.minor>-files` Secret shipped by the DCS provider plugin. If that Secret is not present on the bootstrap cluster, inline the file contents from the [Complete KubeadmControlPlane Configuration](../create-cluster/huawei-dcs.mdx#complete-kubeadmcontrolplane-configuration) appendix instead.

```yaml
---
# 1. DCS API credential. See Infrastructure Resources for the field sources.
apiVersion: v1
kind: Secret
metadata:
name: ait-credential-secret # this exact name lets the installer migrate the credential to the global cluster (see Decommission)
namespace: cpaas-system
labels:
cpaas.io/cluster-name: "global"
type: Opaque
stringData:
authUser: "<dcs-api-user>"
authKey: "<dcs-api-password>"
endpoint: "https://<dcs-api-host>:7443"
# userType: "interconnect" # optional; "interconnect" (default) or "domain"
---
# 2. Control-plane IP / hostname pool (one entry per control-plane replica).
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: DCSIpHostnamePool
metadata:
name: global-cp-pool
namespace: cpaas-system
labels:
cpaas.io/cluster-name: "global"
spec:
pool:
- {ip: "192.0.2.11", mask: "24", gateway: "192.0.2.1", dns: "192.0.2.2", hostname: "global-cp-1", machineName: "global-cp-1"}
- {ip: "192.0.2.12", mask: "24", gateway: "192.0.2.1", dns: "192.0.2.2", hostname: "global-cp-2", machineName: "global-cp-2"}
- {ip: "192.0.2.13", mask: "24", gateway: "192.0.2.1", dns: "192.0.2.2", hostname: "global-cp-3", machineName: "global-cp-3"}
---
# 3. Control-plane VM spec.
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: DCSMachineTemplate
metadata:
name: global-cp-template
namespace: cpaas-system
labels:
cpaas.io/cluster-name: "global"
spec:
template:
spec:
vmTemplateName: <vm-template-name>
# Places the cloned VMs in a DCS compute cluster.
resource:
type: cluster
name: <dcs-cluster-name>
vmConfig:
dvSwitchName: <dvswitch-name>
portGroupName: <port-group-name>
dcsMachineCpuSpec: {quantity: 16}
dcsMachineMemorySpec: {quantity: 32768} # MB
dcsMachineDiskSpec:
- {quantity: 0, datastoreName: <datastore-name>, systemVolume: true}
- {quantity: 10, datastoreName: <datastore-name>, path: /var/lib/etcd, format: xfs}
- {quantity: 100, datastoreName: <datastore-name>, path: /var/lib/kubelet, format: xfs}
- {quantity: 100, datastoreName: <datastore-name>, path: /var/lib/containerd, format: xfs}
# This example keeps /var/cpaas as a template disk for simplicity. For
# production, declare it in DCSIpHostnamePool.spec.pool[].persistentDisk
# so it survives node replacement (see Infrastructure Resources).
- {quantity: 100, datastoreName: <datastore-name>, path: /var/cpaas, format: xfs}
ipHostPoolRef:
name: global-cp-pool
---
# 4. Control plane.
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
metadata:
name: global-kcp
namespace: cpaas-system
labels:
cpaas.io/cluster-name: "global"
annotations:
controlplane.cluster.x-k8s.io/skip-kube-proxy: ""
spec:
replicas: 3
version: ${K8S_VERSION}
rolloutStrategy:
type: RollingUpdate
rollingUpdate: {maxSurge: 0}
machineTemplate:
nodeDrainTimeout: 1m
nodeDeletionTimeout: 5m
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: DCSMachineTemplate
name: global-cp-template
kubeadmConfigSpec:
format: ignition
users:
- name: boot
sshAuthorizedKeys:
- "ssh-ed25519 AAAA...replace-with-your-public-key... global-boot"
files:
- contentFrom: {secret: {name: dcs-kubernetes-<major.minor>-files, key: psa-config.yaml}}
path: /etc/kubernetes/admission/psa-config.yaml
owner: "root:root"
permissions: "0644"
- contentFrom: {secret: {name: dcs-kubernetes-<major.minor>-files, key: control-plane-kubelet-patch.json}}
path: /etc/kubernetes/patches/kubeletconfiguration0+strategic.json
owner: "root:root"
permissions: "0644"
- contentFrom: {secret: {name: dcs-kubernetes-<major.minor>-files, key: audit-policy.yaml}}
path: /etc/kubernetes/audit/policy.yaml
owner: "root:root"
permissions: "0644"
preKubeadmCommands:
- while ! ip route | grep -q "default via"; do sleep 1; done; echo "NetworkManager started"
- mkdir -p /run/cluster-api && restorecon -Rv /run/cluster-api
- if [ -f /etc/disk-setup.sh ]; then bash /etc/disk-setup.sh; fi
postKubeadmCommands:
- chmod 600 /var/lib/kubelet/config.yaml
clusterConfiguration:
imageRepository: cloud.alauda.io/alauda
dns: {imageTag: <dns-image-tag>}
etcd:
local:
imageTag: <etcd-image-tag>
serverCertSANs: ["${CONTROL_PLANE_VIP}", "etcd.kube-system"]
apiServer:
extraArgs:
audit-log-format: json
audit-log-maxage: "30"
audit-log-maxbackup: "10"
audit-log-maxsize: "200"
audit-log-mode: batch
audit-log-path: /etc/kubernetes/audit/audit.log
audit-policy-file: /etc/kubernetes/audit/policy.yaml
admission-control-config-file: /etc/kubernetes/admission/psa-config.yaml
profiling: "false"
tls-min-version: VersionTLS12
kubelet-certificate-authority: /etc/kubernetes/pki/ca.crt
extraVolumes:
- {name: vol-dir-0, hostPath: /etc/kubernetes, mountPath: /etc/kubernetes, pathType: Directory}
controllerManager:
extraArgs: {bind-address: "::", profiling: "false", tls-min-version: VersionTLS12, flex-volume-plugin-dir: "/opt/libexec/kubernetes/kubelet-plugins/volume/exec/"}
scheduler:
extraArgs: {bind-address: "::", tls-min-version: VersionTLS12, profiling: "false"}
initConfiguration:
patches: {directory: /etc/kubernetes/patches}
nodeRegistration:
kubeletExtraArgs:
node-labels: "kube-ovn/role=master" # kube-ovn-recognized role value; do not rename
provider-id: PROVIDER_ID
volume-plugin-dir: "/opt/libexec/kubernetes/kubelet-plugins/volume/exec/"
protect-kernel-defaults: "true"
joinConfiguration:
patches: {directory: /etc/kubernetes/patches}
nodeRegistration:
kubeletExtraArgs:
node-labels: "kube-ovn/role=master" # kube-ovn-recognized role value; do not rename
provider-id: PROVIDER_ID
volume-plugin-dir: "/opt/libexec/kubernetes/kubelet-plugins/volume/exec/"
protect-kernel-defaults: "true"
---
# 5. DCS infrastructure cluster.
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: DCSCluster
metadata:
name: "global"
namespace: cpaas-system
labels:
cpaas.io/cluster-name: "global"
annotations:
cpaas.io/registry-address: "${NODE_REGISTRY_ADDRESS}"
spec:
controlPlaneLoadBalancer: {host: "${CONTROL_PLANE_VIP}", port: 6443, type: external}
controlPlaneEndpoint: {host: "${CONTROL_PLANE_VIP}", port: 6443}
credentialSecretRef: {name: ait-credential-secret}
networkType: kube-ovn
site: <dcs-site-id>
---
# 6. Top-level CAPI Cluster: global wiring, labels, and annotations.
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
name: global
namespace: cpaas-system
labels:
cpaas.io/cluster-name: "global"
cluster-type: DCS
is-global: "true"
annotations:
capi.cpaas.io/resource-group-version: infrastructure.cluster.x-k8s.io/v1beta1
capi.cpaas.io/resource-kind: DCSCluster
cpaas.io/registry-address: "${NODE_REGISTRY_ADDRESS}"
cpaas.io/kube-ovn-join-cidr: 100.5.0.0/16
cpaas.io/kube-ovn-version: <kube-ovn-chart-version>
cpaas.io/os-family: <os-family> # OS family of the Alauda OS image, for example slemicro
spec:
clusterNetwork:
pods: {cidrBlocks: ["${CLUSTER_CIDR}"]}
services: {cidrBlocks: ["${SERVICE_CIDR}"]}
controlPlaneRef:
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
name: global-kcp
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: DCSCluster
name: global
```

### Values to Replace \{#dcs-worked-example-values}

| Placeholder / variable | Where it comes from |
|---|---|
| `${K8S_VERSION}`, `${CONTROL_PLANE_VIP}`, `${NODE_REGISTRY_ADDRESS}`, `${CLUSTER_CIDR}`, `${SERVICE_CIDR}` | Exported in Step 1. |
| `<dcs-api-user>` / `<dcs-api-password>` / `<dcs-api-host>` / `<dcs-site-id>` | DCS platform credentials and site. See [Cloud Credentials](../infrastructure/huawei-dcs.mdx#cloud-credentials). |
| `<vm-template-name>`, `<dns-image-tag>`, `<etcd-image-tag>` | The `cpaas.io/dcs-vm-template` ConfigMap. See [Resolving Placeholder Values](../create-cluster/huawei-dcs.mdx#resolving-placeholders). |
| `<dcs-cluster-name>`, `<dvswitch-name>`, `<port-group-name>`, `<datastore-name>` | DCS platform objects. Confirm with the DCS administrator. |
| `192.0.2.x`, `<os-family>`, `<kube-ovn-chart-version>` | Node IPs/gateway/DNS for your network; the OS family of the image; and the kube-ovn chart version from the [OS Support Matrix](../overview/os-support-matrix.mdx). |
| `ssh-ed25519 AAAA...` | A real OpenSSH public key. The `ignition` format rejects an empty list; supply any valid key even if you do not plan to SSH in. |

<Directive type="info" title="Secret encryption and disaster recovery">
This example does not enable etcd secret encryption-at-rest. To enable it, or to deploy a DR pair, add the `/etc/kubernetes/encryption-provider.conf` file and the `apiServer.extraArgs.encryption-provider-config` argument as described in [Optional Disaster Recovery Deployment](#optional-disaster-recovery-deployment).
</Directive>