Windows worker node stuck in 'Waiting for Node Ref' state with no VM found error
Article Number: 000022315
Environment
- Rancher v2.6+
- Rancher-provisioned RKE2 cluster, with the vSphere Cloud Provider (vSphere CPI/CSI)
Situation
- Attempting to add a Windows worker node to a cluster, with the Linux control plane and worker nodes already configured. The Windows node appears as added and active in the Cluster Nodes view but remains in a 'Waiting for Node Ref' state within Rancher Cluster Management. The node is otherwise functional in the cluster.
- An error of the format below is seen in the logs of the vsphere-cpi-cloud-controller-manager Pod, in the affected cluster, while registering the new Windows worker node:
Unable to find VM by DNS Name. VM DNS Name: mynodename
Error while looking for vm=mynodename(byName) in vc=vck8s01 and datacenter=vck8sDatacenter: No VM found
Did not find node mynodename in vc=vck8s01 and datacenter=vck8sDatacenter
Cause
The Windows worker nodes were not using the FQDN, causing a VM not found error. This prevented the nodes from registering correctly with the cluster.
Resolution
Confirm whether the node name in Kubernetes matches the FQDN of the VM in vCenter, as reported by VMware tools, including the domain. If it is not, as a workaround, add the FQDN to the NodeName in the Rancher UI by navigating to Cluster Management -> [relevant cluster] > Registration -> Step2 (Advanced). This ensures the nodes register with the cluster successfully and become active.