diff --git a/docs/en/global/install.mdx b/docs/en/global/install.mdx index 3c824a0f..68712957 100644 --- a/docs/en/global/install.mdx +++ b/docs/en/global/install.mdx @@ -564,13 +564,14 @@ Use the DCS resource fields from [Creating Clusters on Huawei DCS](../create-clu - Add `Cluster.metadata.labels.is-global: "true"` and `Cluster.metadata.labels.cluster-type: DCS`. - Add `Cluster.metadata.annotations["cpaas.io/registry-address"]` with `${NODE_REGISTRY_ADDRESS}`. - Set `KubeadmControlPlane.spec.kubeadmConfigSpec.format: ignition` for Alauda OS. -- Keep the release manifest's non-encryption kubeadm files, kubelet patches, audit policy, and installer RBAC entries. +- Keep the `KubeadmControlPlane.spec.kubeadmConfigSpec.users` entry with a non-empty `sshAuthorizedKeys` list (the `boot` user). The DCS `ignition` format rejects an empty SSH key list, so this field is required even for a `global` cluster you do not plan to access over SSH. See [Resolving Placeholder Values](../create-cluster/huawei-dcs.mdx#resolving-placeholders) for what to supply when no interactive key is needed. +- Keep the non-encryption kubeadm files, kubelet patches, audit policy, and installer RBAC entries. The file contents (the PodSecurity admission config, the kubelet patch, and the audit policy), together with the full `clusterConfiguration`, `preKubeadmCommands`, `postKubeadmCommands`, and the init and join node-registration patches, are identical to a workload cluster. Copy them from the [Complete KubeadmControlPlane Configuration](../create-cluster/huawei-dcs.mdx#complete-kubeadmcontrolplane-configuration) appendix, or reference the `dcs-kubernetes--files` Secret documented there. The wiring fragment below shows only the `global`-specific fields layered on top of that base. - For a normal non-DR deployment, do not set `DCSCluster.spec.encryptionProviderConfigRef` and do not add `/etc/kubernetes/encryption-provider.conf` to `KubeadmControlPlane.spec.kubeadmConfigSpec.files`. - Keep `/var/cpaas` as platform state. If you need the disk to survive rolling replacement, declare it in `DCSIpHostnamePool.spec.pool[].persistentDisk`; do not rely on `DCSMachineTemplate` template disks as preserved state. - Use concrete `datastoreName` values for DCS local storage unless you have verified that the selected datastore cluster can place volumes on hosts that can run the target VM. - The following YAML is a differential fragment, not a complete manifest that you can apply directly. Merge these `global`-specific changes into the manifest that you prepare from the DCS create-cluster references, then apply the complete manifest file. + The following YAML is a differential fragment, not a complete manifest that you can apply directly. Merge these `global`-specific changes into the manifest that you prepare from the DCS create-cluster references, then apply the complete manifest file. If you would rather start from a complete file, adapt the [Worked Example: Complete `global` Manifest for Huawei DCS](#dcs-worked-example) at the end of this page instead of assembling it from fragments. The following fragment shows the `global`-specific Cluster API wiring. Fill the provider resource fields by using the DCS create-cluster references above. @@ -625,7 +626,7 @@ spec: infrastructureRef: apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 kind: DCSMachineTemplate - name: global-master-template + name: global-cp-template kubeadmConfigSpec: format: ignition clusterConfiguration: @@ -726,7 +727,7 @@ spec: infrastructureRef: apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 kind: VSphereMachineTemplate - name: global-master-machine-template + name: global-cp-machine-template kubeadmConfigSpec: format: cloud-init clusterConfiguration: @@ -823,7 +824,7 @@ spec: infrastructureRef: apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 kind: HCSMachineTemplate - name: global-master-machine-template + name: global-cp-machine-template kubeadmConfigSpec: format: cloud-init clusterConfiguration: @@ -1211,6 +1212,7 @@ kubectl --kubeconfig get clustermodule global |---|---|---| | Machines stay in `Pending` or do not appear | `kubectl describe machine -n cpaas-system ` | The provider-specific failure reason on the machine `Bootstrap` and `Infrastructure` conditions. IaaS quota, network, and credential issues surface here. | | `KubeadmControlPlane` does not reach `Ready` | `kubectl get nodes` with the new cluster kubeconfig and `kubectl describe kubeadmcontrolplane -n cpaas-system` | etcd health on the first control plane node and join progress for the remaining nodes. | +| `KubeadmControlPlane` stays not `Ready` (often `EtcdClusterHealthy=Unknown`) and the installer hangs, even though the control-plane VMs are up, and the bootstrap (KIND) Pods cannot reach the control-plane subnet | NAT rules on the bootstrap (KIND) host | Stopping the host firewall after KIND started can flush the KIND bridge's SNAT masquerade, so the CAPI controller Pods running in KIND cannot route to the new control-plane subnet. Re-add it: `iptables -t nat -A POSTROUTING -s 172.18.0.0/16 ! -d 172.18.0.0/16 -j MASQUERADE` (`172.18.0.0/16` is the default KIND subnet). | | Pods in `kube-system` stay `Pending` or fail to pull images | `kubectl --kubeconfig describe pod -n kube-system ` | Image pull errors usually mean the node-facing registry address is not reachable from the new cluster's subnet. | | Installer progress API shows a stalled stage | `/var/cpaas/data/installer.log` | The most recent phase line and the most recent error message. Retried errors repeat on a short interval; persistent errors do not advance. | | `ClusterModule/global` does not reach a healthy phase | `kubectl --kubeconfig describe clustermodule global` | The `Status.conditions` describe which module is blocking the cluster from completing. | @@ -1450,9 +1452,245 @@ The installation is successful when all of the following conditions are true: - Critical Pods in `cpaas-system` are `Running` or `Completed`. - `ClusterModule/global` reports the base module as healthy. +## Decommission the Bootstrap Host \{#decommission-bootstrap-host} + +After the installer reports success, the `global` cluster runs its own Cluster API providers and no longer depends on the temporary `minialauda` KIND cluster on the bootstrap host. Once verification passes, decommission the bootstrap host by removing only the local `minialauda` KIND cluster and its container network on that host. + + + The installer migrates the DCS credential Secret to the `global` cluster automatically only when it is named `ait-credential-secret` (the name used in the [worked example](#dcs-worked-example)). If your `DCSCluster.spec.credentialSecretRef` points to a Secret with a different name, it is not carried over — copy that Secret into the `global` cluster's `cpaas-system` namespace before you remove the bootstrap host. Without it, the `global` cluster's DCS provider has no DCS API credentials and cannot reconcile; for example, scaling the cluster out later fails. Verify with `kubectl --kubeconfig get secret -n cpaas-system`. + + + + Do not run `kubectl delete cluster global`, and do not delete the `Cluster`, `KubeadmControlPlane`, or provider infrastructure objects as a cleanup step. After installation these objects own the live `global` control plane machines, so deleting them cascades into deleting the control plane VMs and destroys the cluster you just installed. Decommissioning is limited to removing the local KIND cluster on the bootstrap host; leave the Cluster API objects in place. + + ## Next Steps - Install additional provider plugins on the new `global` cluster: see [Installation](../install/index.mdx). - Configure infrastructure resources for workload clusters: see [Infrastructure Resources](../infrastructure/index.mdx). - Create your first workload cluster: see [Creating Clusters](../create-cluster/index.mdx). - Plan disaster recovery: see [Global Cluster Disaster Recovery](./disaster_recovery.mdx). + +## Worked Example: Complete `global` Manifest for Huawei DCS \{#dcs-worked-example} + +This is a complete, single-file manifest for a three-replica control-plane `global` cluster on Huawei DCS. It is the same set of resources described in Step 4, already assembled so you do not have to merge fragments across pages. It uses documentation-only example values: replace every ``, and reuse the `${...}` variables you exported in Step 1. Apply it in Step 5. + +This example targets a non-DR cluster. The three large kubeadm files (the PodSecurity admission config, the kubelet patch, and the audit policy) are identical to a workload cluster, so they are pulled from the `dcs-kubernetes--files` Secret shipped by the DCS provider plugin. If that Secret is not present on the bootstrap cluster, inline the file contents from the [Complete KubeadmControlPlane Configuration](../create-cluster/huawei-dcs.mdx#complete-kubeadmcontrolplane-configuration) appendix instead. + +```yaml +--- +# 1. DCS API credential. See Infrastructure Resources for the field sources. +apiVersion: v1 +kind: Secret +metadata: + name: ait-credential-secret # this exact name lets the installer migrate the credential to the global cluster (see Decommission) + namespace: cpaas-system + labels: + cpaas.io/cluster-name: "global" +type: Opaque +stringData: + authUser: "" + authKey: "" + endpoint: "https://:7443" + # userType: "interconnect" # optional; "interconnect" (default) or "domain" +--- +# 2. Control-plane IP / hostname pool (one entry per control-plane replica). +apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 +kind: DCSIpHostnamePool +metadata: + name: global-cp-pool + namespace: cpaas-system + labels: + cpaas.io/cluster-name: "global" +spec: + pool: + - {ip: "192.0.2.11", mask: "24", gateway: "192.0.2.1", dns: "192.0.2.2", hostname: "global-cp-1", machineName: "global-cp-1"} + - {ip: "192.0.2.12", mask: "24", gateway: "192.0.2.1", dns: "192.0.2.2", hostname: "global-cp-2", machineName: "global-cp-2"} + - {ip: "192.0.2.13", mask: "24", gateway: "192.0.2.1", dns: "192.0.2.2", hostname: "global-cp-3", machineName: "global-cp-3"} +--- +# 3. Control-plane VM spec. +apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 +kind: DCSMachineTemplate +metadata: + name: global-cp-template + namespace: cpaas-system + labels: + cpaas.io/cluster-name: "global" +spec: + template: + spec: + vmTemplateName: + # Places the cloned VMs in a DCS compute cluster. + resource: + type: cluster + name: + vmConfig: + dvSwitchName: + portGroupName: + dcsMachineCpuSpec: {quantity: 16} + dcsMachineMemorySpec: {quantity: 32768} # MB + dcsMachineDiskSpec: + - {quantity: 0, datastoreName: , systemVolume: true} + - {quantity: 10, datastoreName: , path: /var/lib/etcd, format: xfs} + - {quantity: 100, datastoreName: , path: /var/lib/kubelet, format: xfs} + - {quantity: 100, datastoreName: , path: /var/lib/containerd, format: xfs} + # This example keeps /var/cpaas as a template disk for simplicity. For + # production, declare it in DCSIpHostnamePool.spec.pool[].persistentDisk + # so it survives node replacement (see Infrastructure Resources). + - {quantity: 100, datastoreName: , path: /var/cpaas, format: xfs} + ipHostPoolRef: + name: global-cp-pool +--- +# 4. Control plane. +apiVersion: controlplane.cluster.x-k8s.io/v1beta1 +kind: KubeadmControlPlane +metadata: + name: global-kcp + namespace: cpaas-system + labels: + cpaas.io/cluster-name: "global" + annotations: + controlplane.cluster.x-k8s.io/skip-kube-proxy: "" +spec: + replicas: 3 + version: ${K8S_VERSION} + rolloutStrategy: + type: RollingUpdate + rollingUpdate: {maxSurge: 0} + machineTemplate: + nodeDrainTimeout: 1m + nodeDeletionTimeout: 5m + infrastructureRef: + apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 + kind: DCSMachineTemplate + name: global-cp-template + kubeadmConfigSpec: + format: ignition + users: + - name: boot + sshAuthorizedKeys: + - "ssh-ed25519 AAAA...replace-with-your-public-key... global-boot" + files: + - contentFrom: {secret: {name: dcs-kubernetes--files, key: psa-config.yaml}} + path: /etc/kubernetes/admission/psa-config.yaml + owner: "root:root" + permissions: "0644" + - contentFrom: {secret: {name: dcs-kubernetes--files, key: control-plane-kubelet-patch.json}} + path: /etc/kubernetes/patches/kubeletconfiguration0+strategic.json + owner: "root:root" + permissions: "0644" + - contentFrom: {secret: {name: dcs-kubernetes--files, key: audit-policy.yaml}} + path: /etc/kubernetes/audit/policy.yaml + owner: "root:root" + permissions: "0644" + preKubeadmCommands: + - while ! ip route | grep -q "default via"; do sleep 1; done; echo "NetworkManager started" + - mkdir -p /run/cluster-api && restorecon -Rv /run/cluster-api + - if [ -f /etc/disk-setup.sh ]; then bash /etc/disk-setup.sh; fi + postKubeadmCommands: + - chmod 600 /var/lib/kubelet/config.yaml + clusterConfiguration: + imageRepository: cloud.alauda.io/alauda + dns: {imageTag: } + etcd: + local: + imageTag: + serverCertSANs: ["${CONTROL_PLANE_VIP}", "etcd.kube-system"] + apiServer: + extraArgs: + audit-log-format: json + audit-log-maxage: "30" + audit-log-maxbackup: "10" + audit-log-maxsize: "200" + audit-log-mode: batch + audit-log-path: /etc/kubernetes/audit/audit.log + audit-policy-file: /etc/kubernetes/audit/policy.yaml + admission-control-config-file: /etc/kubernetes/admission/psa-config.yaml + profiling: "false" + tls-min-version: VersionTLS12 + kubelet-certificate-authority: /etc/kubernetes/pki/ca.crt + extraVolumes: + - {name: vol-dir-0, hostPath: /etc/kubernetes, mountPath: /etc/kubernetes, pathType: Directory} + controllerManager: + extraArgs: {bind-address: "::", profiling: "false", tls-min-version: VersionTLS12, flex-volume-plugin-dir: "/opt/libexec/kubernetes/kubelet-plugins/volume/exec/"} + scheduler: + extraArgs: {bind-address: "::", tls-min-version: VersionTLS12, profiling: "false"} + initConfiguration: + patches: {directory: /etc/kubernetes/patches} + nodeRegistration: + kubeletExtraArgs: + node-labels: "kube-ovn/role=master" # kube-ovn-recognized role value; do not rename + provider-id: PROVIDER_ID + volume-plugin-dir: "/opt/libexec/kubernetes/kubelet-plugins/volume/exec/" + protect-kernel-defaults: "true" + joinConfiguration: + patches: {directory: /etc/kubernetes/patches} + nodeRegistration: + kubeletExtraArgs: + node-labels: "kube-ovn/role=master" # kube-ovn-recognized role value; do not rename + provider-id: PROVIDER_ID + volume-plugin-dir: "/opt/libexec/kubernetes/kubelet-plugins/volume/exec/" + protect-kernel-defaults: "true" +--- +# 5. DCS infrastructure cluster. +apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 +kind: DCSCluster +metadata: + name: "global" + namespace: cpaas-system + labels: + cpaas.io/cluster-name: "global" + annotations: + cpaas.io/registry-address: "${NODE_REGISTRY_ADDRESS}" +spec: + controlPlaneLoadBalancer: {host: "${CONTROL_PLANE_VIP}", port: 6443, type: external} + controlPlaneEndpoint: {host: "${CONTROL_PLANE_VIP}", port: 6443} + credentialSecretRef: {name: ait-credential-secret} + networkType: kube-ovn + site: +--- +# 6. Top-level CAPI Cluster: global wiring, labels, and annotations. +apiVersion: cluster.x-k8s.io/v1beta1 +kind: Cluster +metadata: + name: global + namespace: cpaas-system + labels: + cpaas.io/cluster-name: "global" + cluster-type: DCS + is-global: "true" + annotations: + capi.cpaas.io/resource-group-version: infrastructure.cluster.x-k8s.io/v1beta1 + capi.cpaas.io/resource-kind: DCSCluster + cpaas.io/registry-address: "${NODE_REGISTRY_ADDRESS}" + cpaas.io/kube-ovn-join-cidr: 100.5.0.0/16 + cpaas.io/kube-ovn-version: + cpaas.io/os-family: # OS family of the Alauda OS image, for example slemicro +spec: + clusterNetwork: + pods: {cidrBlocks: ["${CLUSTER_CIDR}"]} + services: {cidrBlocks: ["${SERVICE_CIDR}"]} + controlPlaneRef: + apiVersion: controlplane.cluster.x-k8s.io/v1beta1 + kind: KubeadmControlPlane + name: global-kcp + infrastructureRef: + apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 + kind: DCSCluster + name: global +``` + +### Values to Replace \{#dcs-worked-example-values} + +| Placeholder / variable | Where it comes from | +|---|---| +| `${K8S_VERSION}`, `${CONTROL_PLANE_VIP}`, `${NODE_REGISTRY_ADDRESS}`, `${CLUSTER_CIDR}`, `${SERVICE_CIDR}` | Exported in Step 1. | +| `` / `` / `` / `` | DCS platform credentials and site. See [Cloud Credentials](../infrastructure/huawei-dcs.mdx#cloud-credentials). | +| ``, ``, `` | The `cpaas.io/dcs-vm-template` ConfigMap. See [Resolving Placeholder Values](../create-cluster/huawei-dcs.mdx#resolving-placeholders). | +| ``, ``, ``, `` | DCS platform objects. Confirm with the DCS administrator. | +| `192.0.2.x`, ``, `` | Node IPs/gateway/DNS for your network; the OS family of the image; and the kube-ovn chart version from the [OS Support Matrix](../overview/os-support-matrix.mdx). | +| `ssh-ed25519 AAAA...` | A real OpenSSH public key. The `ignition` format rejects an empty list; supply any valid key even if you do not plan to SSH in. | + + + This example does not enable etcd secret encryption-at-rest. To enable it, or to deploy a DR pair, add the `/etc/kubernetes/encryption-provider.conf` file and the `apiServer.extraArgs.encryption-provider-config` argument as described in [Optional Disaster Recovery Deployment](#optional-disaster-recovery-deployment). +