Skip to content

Cluster stuck in “Paused” State causing node registration failures

Article Number: 000022212

Environment

Rancher 2.x, RKE2 cluster

Situation

In some scenarios, a cluster may enter a paused state due to failed or interrupted cluster operations. While the cluster is paused, newly added nodes are unable to complete registration.

During this time, the node installation script may repeatedly log the following error:

[ERROR] 000 received while downloading Rancher connection information.
    Sleeping for 5 seconds and trying again

As a result, nodes remain stuck during provisioning and the cluster does not progress.

Cause

This behavior occurs because Rancher intentionally pauses the CAPI cluster during snapshot restore, cert rotation and encryption key rotation operations. Pausing the cluster prevents Cluster API (CAPI) from reconciling resources during a potentially unsafe state.

If these operations fail or are interrupted, the cluster may remain paused and is not automatically unpaused. This is expected behavior by design, to avoid further reconciliation actions that could impact cluster stability.

Resolution

Unpause the CAPI cluster by setting the .spec.paused to false on the clusters.cluster.x-k8s.io object corresponding to the cluster.

Identify the CAPI cluster name

kubectl get clusters.cluster.x-k8s.io -n fleet-default

Edit the affected cluster

kubectl edit clusters.cluster.x-k8s.io <cluster-name> -n fleet-default

In the cluster Spec locate the field

spec:
  paused: true

Change it to

spec:
  paused: false

save and exit the editor.