Rancher Uninstall via Helm release fails due to post-delete-hook job failure
Article Number: 000022116
Environment
Rancher Versions:
- <= 2.12.3
- 2.11.x
- 2.10.x
- 2.9.2 and later
Rancher installed by Helm Chart
Situation
There is a known bug that can occur when uninstalling the Rancher Helm Chart. The below rancher-post-delete job error is observed:
# helm uninstall rancher -n cattle-system
Error: uninstallation completed with 1 error(s): 1 error occurred:
* job rancher-post-delete failed: BackoffLimitExceeded
This indicates a resource is not removed with the uninstall. Checking the logs from the rancher-post-delete job pod, in this case it is confirmed that the Fleet app is failing to uninstall. The job logs will have:
Uninstalling Rancher resources in the following namespaces: cattle-fleet-system cattle-system rancher-operator-system
--- Deleting the app [fleet] in the namespace [cattle-fleet-system]
Error: failed to delete release: fleet
--- Skip the app [fleet-crd] in the namespace [cattle-fleet-system]
Removing Rancher bootstrap secret in the following namespace: cattle-system
------ Summary ------
Failed to uninstall the following apps: fleet
The Fleet resource preventing the app from being uninstalled is a cronjob:
# kubectl get cronjobs -n cattle-fleet-system
NAMESPACE NAME SCHEDULE TIMEZONE SUSPEND ACTIVE LAST SCHEDULE AGE
cattle-fleet-system fleet-cleanup-gitrepo-jobs @daily <none> False 0 <none> 6h48m
Cause
A new Fleet cronjob to maintain gitrepo resources requires clean up when uninstalling Rancher. This new cronjob resource is not accounted for in the rancher-post-delete job RBAC, and causes the failure due to missing permissions. Repo code: https://github.com/rancher/rancher/blob/main/chart/templates/post-delete-hook-cluster-role.yaml#L15-L17
Resolution
Workaround
A workaround is to have 2 terminals or 2 panes in a multiplexer (tmux/screen) open with helm/kubectl access to the Rancher cluster. In one pane, you will run the helm uninstall command. In the second terminal/pane, run the kubectl patch command to add RBAC permission for the cronjob resource, allowing the rancher-post-delete job to successfully delete the Fleet cronjob during the uninstall.
1. Run the helm uninstall command first.
# Terminal/Pane 1
helm uninstall rancher -n cattle-system
2. Immediately after step 1, run the patch command.
# Terminal/Pane 2
kubectl patch clusterrole rancher-post-delete --type='json' -p='[{"op": "add", "path": "/rules/1/resources/-", "value": "cronjobs"}]'
3. The helm uninstall will succeed with no failures.
# helm uninstall rancher -n cattle-system
release "rancher" uninstalled
Resolution
A GitHub pull request has been submitted to fix this issue permanently in Rancher version 2.13.0 and 2.12.4 releases. Pull Request: https://github.com/rancher/rancher/pull/52277