Skip to content

docs: clarify global immutable-OS install (kubeadm config source, boot SSH key, bootstrap teardown)#113

Open
chinameok wants to merge 5 commits into
masterfrom
docs/global-install-no-ui-clarity
Open

docs: clarify global immutable-OS install (kubeadm config source, boot SSH key, bootstrap teardown)#113
chinameok wants to merge 5 commits into
masterfrom
docs/global-install-no-ui-clarity

Conversation

@chinameok

@chinameok chinameok commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Why

Make docs/en/global/install.mdx sufficient for another person to reproduce a fully no-UI global install on Huawei DCS (CLI/API only). Gaps were found by actually doing that install and diffing against the docs.

Changes (docs/en/global/install.mdx)

Clarity / correctness

  1. Define the source of the kubeadm config in Step 4 (the create-cluster Complete KubeadmControlPlane appendix + dcs-kubernetes-<ver>-files Secret), instead of the undefined "release manifest".
  2. Restate the ignition-required boot user / SSH key in the global DCS requirements.
  3. Decommission section: bootstrap teardown + warning that kubectl delete cluster global cascades into deleting the live control-plane VMs.

Complete worked example
4. New "Worked Example: Complete global Manifest for Huawei DCS" — one copy-pasteable file (Secret, DCSIpHostnamePool, DCSMachineTemplate, KubeadmControlPlane, DCSCluster, Cluster) with the global-specific annotations that were missing (is-global, cluster-type, os-family, kube-ovn-version, kube-ovn-join-cidr, registry-address) and a "Values to Replace" table. Sanitized to RFC 5737 IPs / placeholders.

Operational gaps recovered from a deploy runbook
5. DCS credential Secret migration — confirmed against cpaas-installer code (installer_dcs.go dcsImportDCSCredentialSecret): the installer auto-migrates the credential Secret to the global cluster only when it is named ait-credential-secret (Secrets are excluded from the etcdctl resource migration). The worked example now uses that name; a Decommission note tells anyone using a different name to copy it manually, else the global DCS provider has no credentials and can't reconcile (scale-out fails).
6. Bootstrap NAT stall — Common Stalls row: stopping the host firewall after KIND starts can flush the KIND bridge SNAT masquerade → CAPI controllers in KIND can't reach the new control-plane subnet → KCP stuck EtcdClusterHealthy=Unknown, installer hangs. Fix: re-add the 172.18.0.0/16 masquerade rule.

Inclusive terminology
7. mastercp in example identifiers across the page; kept the functional kube-ovn/role=master label (commented do not rename).

Still deliberately out of scope

  • OS-template ↔ provider version pairing and the version-gated os-family semantics (KubeOS must set kubeos or the node won't boot) — owned separately by the docs owner; the worked example carries the os-family field but not the version-gated rule.
  • Full DCS REST API operational recipes, qcow2 template upload, env-specific values — agent-runbook material, not customer docs.

mask format was already standardized on master by #110.

Validation

Each push validated with yarn install + yarn lint (0 errors) + yarn build in a scratch clone (the in-repo /workspaces volume is too small for node_modules).

Summary by CodeRabbit

Documentation

  • Updated Huawei DCS “global” installation guidance to require a non-empty boot user SSH authorized-keys list, and reiterated the validation behavior for ignition
  • Strengthened “non-encryption” kubeadm/kubelet/audit/installer RBAC guidance with stricter consistency/byte-equivalence requirements and corrected global wiring layering
  • Refreshed Huawei DCS worked wiring/template names and complete global manifest examples
  • Expanded troubleshooting for control-plane readiness hangs with bootstrap connectivity issues, including NAT/iptables MASQUERADE guidance
  • Added post-install “Decommission the Bootstrap Host” steps, with warnings not to delete Cluster API/provider objects and notes on migrating non-default credential secrets to the global cluster first

…t SSH key, bootstrap teardown)

Gaps found while reproducing a no-UI `global` install on Huawei DCS:

- Step 4 told you to "keep the release manifest's" kubeadm files without
  defining what/where that manifest is. Point to the concrete source: the
  Complete KubeadmControlPlane Configuration appendix in the DCS
  create-cluster guide (or the dcs-kubernetes-<major.minor>-files Secret).
- The ignition-required `boot` user / non-empty sshAuthorizedKeys was stated
  in the create-cluster guide but not restated in the global DCS requirements,
  so a manifest assembled from the thin fragment can omit it and fail.
- Added a Decommission step plus a warning that `kubectl delete cluster global`
  cascades into deleting the live control-plane VMs.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5f5227cc-8c27-4298-a2c9-3b7cd899c8e2

📥 Commits

Reviewing files that changed from the base of the PR and between fce3b9d and 845a5fa.

📒 Files selected for processing (1)
  • docs/en/global/install.mdx

Walkthrough

Documentation-only updates to docs/en/global/install.mdx: adds a mandatory non-empty sshAuthorizedKeys requirement for the boot user in the DCS manifest's kubeadmConfigSpec.users, expands RBAC content alignment instructions, updates provider template names across manifest wiring fragments, documents a new bootstrap KIND connectivity troubleshooting pattern, introduces a new "Decommission the Bootstrap Host" section warning against deleting Cluster API objects, and updates the worked example accordingly.

Changes

Global Install Documentation

Layer / File(s) Summary
DCS manifest SSH key and RBAC content guidance
docs/en/global/install.mdx
Adds a mandatory non-empty sshAuthorizedKeys list for the boot user in kubeadmConfigSpec.users (DCS ignition constraint) and expands the prior single-sentence note into explicit instructions requiring kubeadm/kubelet/audit/installer RBAC content to match the workload cluster appendix or referenced Secret.
Provider template name updates in manifest wiring
docs/en/global/install.mdx
Updates the machineTemplate.infrastructureRef.name values in KubeadmControlPlane wiring fragments across DCS, VMware vSphere, and Huawei Cloud Stack providers to use standardized global template names (global-cp-template for DCS, global-cp-machine-template for vSphere and HCS).
Troubleshooting bootstrap KIND connectivity and etcd
docs/en/global/install.mdx
Adds a new troubleshooting row documenting that when KubeadmControlPlane stays not Ready (often EtcdClusterHealthy=Unknown) and the installer hangs, it may be caused by bootstrap KIND unable to route to the control-plane subnet; provides guidance on restoring KIND host NAT/iptables SNAT masquerade rules with the expected MASQUERADE command.
Bootstrap host decommission section and worked example
docs/en/global/install.mdx
Adds a new post-verification "Decommission the Bootstrap Host" section specifying that cleanup is limited to removing only the local minialauda KIND cluster and its container network, with an explicit warning that deleting Cluster API objects will cascade into destroying the live global control plane. Also adds a directive to migrate DCS credential Secret to the global cluster when the Secret name differs from ait-credential-secret. Updates the worked Huawei DCS complete manifest example to include kubeadmConfigSpec.users for the boot user with sshAuthorizedKeys, and clarifies that ignition rejects an empty SSH key list.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Possibly related PRs

  • alauda/immutable-infra-docs#62: Introduces the overall global install workflow using a temporary KIND bootstrap host in the same install.mdx file that this PR extends with SSH key constraints and decommission guidance.
  • alauda/immutable-infra-docs#67: Updates the same global install documentation around the minialauda KIND host lifecycle and kubeadm field wiring, directly preceding the decommission procedure and SSH key validation added here.

Poem

🐇 Hop, hop, the bootstrap's done,
Keys must not be empty — every one!
The KIND cluster goes, but CAPI stays,
No cascading deletes on install day.
Clean the nest, but guard the hive,
The global control plane's kept alive! 🌿

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main documentation changes: clarifying kubeadm configuration source, documenting the boot SSH key requirement, and adding bootstrap teardown guidance.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch docs/global-install-no-ui-clarity

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

chinameok and others added 2 commits June 16, 2026 11:42
Readers previously had to assemble the global DCS manifest from a differential
fragment plus the create-cluster appendix plus the infrastructure page. Add a
"Worked Example" section with a complete, copy-pasteable manifest (Secret,
DCSIpHostnamePool, DCSMachineTemplate, KubeadmControlPlane, DCSCluster, Cluster)
including the global-specific annotations (is-global, cluster-type,
os-family, kube-ovn-version, kube-ovn-join-cidr, registry-address) and a
"Values to Replace" table, linked from Step 4.

Derived from a real no-UI DCS global install; sanitized to RFC5737 example IPs
and placeholders. The three large kubeadm files use the dcs-kubernetes-<ver>-files
Secret with an inline-from-appendix fallback. Non-DR (no encryption-provider.conf).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The worked-example and decommission sections linked to #verification and
#step-1/4/5 anchors, but those headings carry no explicit {#id}, so doom lint
flags them as unmatched. Reference those sections as plain text instead, matching
the page's existing style. Verified with yarn lint (0 errors) and yarn build.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Jun 16, 2026

Copy link
Copy Markdown

Deploying alauda-immutable-infra with  Cloudflare Pages  Cloudflare Pages

Latest commit: 845a5fa
Status: ✅  Deploy successful!
Preview URL: https://ac5b6303.alauda-immutable-infra.pages.dev
Branch Preview URL: https://docs-global-install-no-ui-cl.alauda-immutable-infra.pages.dev

View logs

chinameok and others added 2 commits June 16, 2026 15:01
Rename the example resource names, hostnames, and machineNames from
global-master-* to global-cp-* across the page (worked example plus the
Step 4 fragments) to follow current Kubernetes inclusive terminology.

The kube-ovn/role=master node label is left unchanged because it is a
kube-ovn-recognized value; an inline comment marks it as do-not-rename.
Verified with yarn lint (0 errors) and yarn build.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…rade stall

#4 (confirmed against cpaas-installer code): the installer auto-migrates the DCS
credential Secret to the global cluster ONLY when it is named ait-credential-secret
(installer_dcs.go dcsImportDCSCredentialSecret, hardcoded name; Secrets are excluded
from the etcdctl resource migration). Name the worked-example Secret
ait-credential-secret so it is carried over, and add a Decommission note: if the
credential Secret has a different name, copy it to the global cluster manually or
the DCS provider there cannot reconcile (e.g. scale-out fails).

#2: add a Common Stalls row for the silent installer hang where stopping the host
firewall after KIND starts flushes the KIND bridge SNAT masquerade, so the CAPI
controllers in KIND cannot reach the new control-plane subnet (KCP stuck
EtcdClusterHealthy=Unknown). Fix: re-add the 172.18.0.0/16 masquerade rule.

Verified with yarn lint (0 errors) and yarn build.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant