Cluster agent is not connected after accidentally deleting the cattle-system Namespace of a downstream cluster
Article Number: 000021478
Environment
Rancher 2.7.x, 2.8.x
Situation
After accidentally deleting the cattle-system Namespace of a downstream-cluster, the cluster is no longer accesible in Rancher UI due to the cluster agent being removed. To recover it, the cluster agent must be manually recreated and the cluster service account token updated.
Requierements
-
Rancher Management
-
Kubectl CLI and kubeconfig file.
-
Downstream cluster
-
SSH acces to controlplane.
- Kubectl CLI and kubeconfig file.
Resolution
RKE2 custom and node-driver clusters
Redeploy the Rancher agents
Steps 1.1 Connect to the affected downstream cluster.
- Backup the rancher.cattle.io validating and mutating webhooks
kubectl get mutatingwebhookconfigurations rancher.cattle.io -oyaml > backup-mutatingwebhookconfigurations.yaml
kubectl get validatingwebhookconfigurations rancher.cattle.io -oyaml > backup-validatingwebhookconfigurations.yaml
- Delete the rancher.cattle.io validating and mutating webhooks
kubectl delete mutatingwebhookconfigurations rancher.cattle.io
kubectl delete validatingwebhookconfigurations rancher.cattle.io
Note: these objects will be recreated once the cluster is connected again.
1.2 Manually redeploy the agent
Follow the steps described in this section to redeploy the Rancher agents:
The namespace cattle-system and the cluster agent will be recreated:
1.3 Force a cluster reconciliation Apply a minor change in the cluster configuration, such as changing the snap retention for etcd.
- Click ☰ > Cluster Management.
- Go to the cluster you want to configure and click ⋮ > Edit Config.
- Cluster Configuration > etcd > Increase the number of Snapshots per node.