Fix AKS cluster creation failure: replace unavailable VM SKU with STANDARD_NC6S_V3 in eastus#4036
Conversation
…in bootstrapping scripts
|
@copilot This is the error message from the run associated with this branch.
|
…rt general AKS VM size
The
|
|
@copilot do not update the region, just update to use the smallest gpu sku instead. |
…TION=eastus2 overrides
Done in commit |
|
Hi @kingernupur Could you help review and approve the PR, this is to fix the provision error for the aks cluster used in this repo. |
The
eastusregion for this subscription only allows GPU/HPC VM sizes for AKS clusters. The previously usedSTANDARD_D3_V2andSTANDARD_D4S_V3SKUs are not available there, causing AKS cluster creation to fail. This left theinferencecomputetarget unattached, causing the subsequentaz ml online-endpoint createto fail withKubernetesComputeError: ComputeNotFound.Changes
infra/bootstrapping/sdk_helpers.sh: Update default VM SKU inensure_aks_computeand the hardcoded size inensure_k8s_computetoSTANDARD_NC6S_V3. RemovedLOCATION=eastus2override fromensure_k8s_compute.infra/bootstrapping/bootstrap.sh: Update VM size toSTANDARD_NC6S_V3for both general AKS clusters and the Arc cluster creation call. RemovedLOCATION=eastus2override from the Arc cluster call.STANDARD_NC6S_V3(6 vCPUs, NVIDIA Tesla V100) is the smallest GPU SKU confirmed available in theeastussubscription allowed VM sizes list.