Skip to content

Cluster Stuck in “Paused” State After Disaster Recovery (DR) Process

Article Number: 000021399

Environment

Rancher Server 2.7.6 and above

Situation

In certain cases, a downstream cluster may enter a broken state that requires a Disaster Recovery (DR) process to restore it to an active state.
However, the DR process may occasionally fail to complete successfully, becoming stuck indefinitely. When this happens, the cluster enters a “paused” state.

This condition can be verified by inspecting the clusters.cluster.x-k8s.io object in the fleet-default namespace of the local (upstream) cluster:

kubectl get clusters.cluster.x-k8s.io <CLUSTER_NAME> -n fleet-default -o yaml

In the output, you will see the following field set to true:

spec:
  paused: true

Cause

The issue typically occurs due to one of the following:

  • An unexpected incident (e.g., network interruption, OS failure, etc.) leading the cluster into a broken state.
  • A complete outage rendering all Control Plane nodes unavailable.

Resolution

To recover the cluster from the paused state:

  1. Edit the clusters.cluster.x-k8s.io object in the fleet-default namespace on the local (upstream) cluster:

kubectl edit clusters.cluster.x-k8s.io <CLUSTER_NAME> -n fleet-default
2. Locate the following field:

spec: paused: true
3. Change the value of paused to false, then save and exit the editor.

spec: paused: false

These steps will instruct Rancher to unpause the cluster, allowing the restore process to continue.

Once the cluster resumes activity, it is recommended to re-run the DR process to ensure the cluster is fully recovered.
For detailed guidance, refer to the official Rancher Manager Backup and Restore documentation for your specific distribution.