Node Removal After Changing Cloud Provider to vSphere in an Existing RKE2 Cluster
This document (000021870) is provided subject to the disclaimer at the end of this document.
Environment
Applicable for all versions of:
- RKE2
- K3s
Situation
After updating the cloud provider configuration in an existing RKE2 cluster from the default (no explicit cloud provider) to vsphere
, some nodes became unavailable in the Rancher UI and appeared in the nodenotfound
state.
The following log entries were recorded in the vSphere Cloud Provider Interface (CPI) logs:
E0514 06:31:06.975972 1 datacenter.go:128] Unable to find VM by UUID. VM UUID: rke2://<NODE_IDENTIFIER>
E0514 06:31:06.975992 1 search.go:181] Error while looking for vm=rke2://<NODE_IDENTIFIER>(byUUID) in vc=<VSPHERE_VCENTER> and datacenter=<VSPHERE_DATACENTER>: No VM found
I0514 06:31:06.976000 1 search.go:186] Did not find node rke2://<NODE_IDENTIFIER> in vc=<VSPHERE_VCENTER> and datacenter=<VSPHERE_DATACENTER>
I0514 06:31:06.983134 1 instances.go:177] instances.InstanceExistsByProviderID() NOT CACHED for node uid "rke2://<NODE_IDENTIFIER>"
I0514 06:31:06.983150 1 node_lifecycle_controller.go:164] deleting node since it is no longer present in cloud provider: <NODE_IDENTIFIER>
I0514 06:31:06.983268 1 event.go:389] "Event occurred" object="<NODE_IDENTIFIER>" fieldPath="" kind="Node" apiVersion="" type="Normal" reason="DeletingNode" message="Deleting node <NODE_IDENTIFIER> because it does not exist in the cloud provider"
These messages indicate that the vSphere CPI attempted to locate virtual machines based on the providerID
format, and nodes with the rke2://
prefix could not be matched to any virtual machines in the vSphere environment.
Resolution
Changing the cloud provider on an existing RKE2 cluster is not supported. The cloud-provider
parameter must be defined during the initial provisioning of the cluster and cannot be altered afterward without consequences.
When the cloud provider is changed to vsphere
post-deployment:
-
The vSphere CPI only recognizes and manages nodes with a
providerID
that begins withvsphere://
. -
Nodes previously registered with
providerID: rke2://
are not recognized by the vSphere CPI. -
The Kubernetes Node Lifecycle Controller, informed by CPI, interprets these nodes as non-existent in the cloud infrastructure and proceeds to remove them from the cluster.
To restore cluster health, affected nodes must be replaced with newly provisioned nodes that are configured with the correct cloud provider from the outset. These new nodes will be registered with the correct providerID
format and managed successfully by the vSphere CPI.
Cause
The vSphere CPI expects nodes to be registered with a providerID
in the format vsphere://
. Nodes originally registered with the RKE2 default behavior (i.e., without a cloud provider) have a providerID
format of rke2://
. These identifiers do not match any VM in vSphere. As a result, the CPI reports them as non-existent, and the Kubernetes Node Lifecycle Controller initiates their deletion from the cluster.
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.