Failed ETCD snapshot restoration leads the cluster into stuck "paused" state
This document (000021399) is provided subject to the disclaimer at the end of this document.
Environment
Rancher Server 2.7.6 and above
Situation
In some cases, the downstream cluster can get into a broken state which requires a Disaster Recovery process to bring it back to its active state.
At some point, the DR process does not finish properly and hangs up indefinitely which leads the cluster into what is called a "paused" state.
This symptom can be seen by checking the clusters.cluster.x-k8s.io
object in the fleet-default
namespace from the local (upstream) cluster.
kubectl get clusters.cluster.x-k8s.io <CLUSTER_NAME> -n fleet-default -o yaml
In the yaml output, you should see the .spec.paused
field being set to true.
Resolution
To unblock this situation, the following steps are recommended to perform:
- edit the clusters.cluster.x-k8s.io
object in the fleet-default
namespace from the local (upstream) cluster
kubectl edit clusters.cluster.x-k8s.io <CLUSTER_NAME> -n fleet-default -o yaml
- refer to the .spec.paused
field being set to false
- save the file and exit
The above steps will instruct Rancher to unpause the cluster or unblock the stuck situation to continue doing the restore process.
The recommended approach would be performing the DR process again after the edit is made.
Right after this, please refer to Rancher Manager backup and restore docs here to continue the DR process depending on the distribution in use (RKE/RKE2/K3S).
Cause
an unforeseen incident (network, OS failure etc...) led the cluster into a broken state.
an outage that made all Control Plane nodes completely unavailable.
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.