Cluster stuck in “Paused” State causing node registration failures
Article Number: 000022212
Environment
Rancher 2.x, RKE2 cluster
Situation
In some scenarios, a cluster may enter a paused state due to failed or interrupted cluster operations. While the cluster is paused, newly added nodes are unable to complete registration.
During this time, the node installation script may repeatedly log the following error:
[ERROR] 000 received while downloading Rancher connection information.
Sleeping for 5 seconds and trying again
As a result, nodes remain stuck during provisioning and the cluster does not progress.
Cause
This behavior occurs because Rancher intentionally pauses the CAPI cluster during snapshot restore, cert rotation and encryption key rotation operations. Pausing the cluster prevents Cluster API (CAPI) from reconciling resources during a potentially unsafe state.
If these operations fail or are interrupted, the cluster may remain paused and is not automatically unpaused. This is expected behavior by design, to avoid further reconciliation actions that could impact cluster stability.
Resolution
Unpause the CAPI cluster by setting the .spec.paused to false on the clusters.cluster.x-k8s.io object corresponding to the cluster.
Identify the CAPI cluster name
kubectl get clusters.cluster.x-k8s.io -n fleet-default
Edit the affected cluster
kubectl edit clusters.cluster.x-k8s.io <cluster-name> -n fleet-default
In the cluster Spec locate the field
spec:
paused: true
Change it to
spec:
paused: false
save and exit the editor.