Skip to content

Rancher Uninstall via Helm release fails due to post-delete-hook job failure

Article Number: 000022116

Environment

Rancher Versions:

  • <= 2.12.3
  • 2.11.x
  • 2.10.x
  • 2.9.2 and later

Rancher installed by Helm Chart

Situation

There is a known bug that can occur when uninstalling the Rancher Helm Chart. The below rancher-post-delete job error is observed:

# helm uninstall rancher -n cattle-system
Error: uninstallation completed with 1 error(s): 1 error occurred:
        * job rancher-post-delete failed: BackoffLimitExceeded 

This indicates a resource is not removed with the uninstall. Checking the logs from the rancher-post-delete job pod, in this case it is confirmed that the Fleet app is failing to uninstall. The job logs will have:


Uninstalling Rancher resources in the following namespaces: cattle-fleet-system cattle-system rancher-operator-system
--- Deleting the app [fleet] in the namespace [cattle-fleet-system]
Error: failed to delete release: fleet
--- Skip the app [fleet-crd] in the namespace [cattle-fleet-system]
Removing Rancher bootstrap secret in the following namespace: cattle-system
------ Summary ------
Failed to uninstall the following apps: fleet

The Fleet resource preventing the app from being uninstalled is a cronjob:

# kubectl get cronjobs -n cattle-fleet-system
NAMESPACE             NAME                                 SCHEDULE    TIMEZONE   SUSPEND   ACTIVE   LAST SCHEDULE   AGE
cattle-fleet-system   fleet-cleanup-gitrepo-jobs           @daily      <none>     False     0        <none>          6h48m

Cause

A new Fleet cronjob to maintain gitrepo resources requires clean up when uninstalling Rancher. This new cronjob resource is not accounted for in the rancher-post-delete job RBAC, and causes the failure due to missing permissions. Repo code: https://github.com/rancher/rancher/blob/main/chart/templates/post-delete-hook-cluster-role.yaml#L15-L17

Resolution

Workaround

A workaround is to have 2 terminals or 2 panes in a multiplexer (tmux/screen) open with helm/kubectl access to the Rancher cluster. In one pane, you will run the helm uninstall command. In the second terminal/pane, run the kubectl patch command to add RBAC permission for the cronjob resource, allowing the rancher-post-delete job to successfully delete the Fleet cronjob during the uninstall.

1. Run the helm uninstall command first.

# Terminal/Pane 1

helm uninstall rancher -n cattle-system

2. Immediately after step 1, run the patch command.

# Terminal/Pane 2

kubectl patch clusterrole rancher-post-delete --type='json' -p='[{"op": "add", "path": "/rules/1/resources/-", "value": "cronjobs"}]'

3. The helm uninstall will succeed with no failures.

# helm uninstall rancher -n cattle-system
release "rancher" uninstalled

Resolution

A GitHub pull request has been submitted to fix this issue permanently in Rancher version 2.13.0 and 2.12.4 releases. Pull Request: https://github.com/rancher/rancher/pull/52277