rke2 helm-install job failing with INSTALLATION FAILED: cannot re-use a name that is still in use
This document (000021581) is provided subject to the disclaimer at the end of this document.
Environment
- Rancher v2.7+
- RKE2 v1.26+
Situation
During an upgrade of an RKE2 cluster, you may face issues related to helm-install job(s) upgrading internal components such as rke2-ingress-nginx or rke2-metrics-server.
By checking the related helm install Job pod logs you can see the following error message:
Error: INSTALLATION FAILED: cannot re-use a name that is still in use
This situation can occur as the result of a previously failed update to or removal of the component.
This KB describes how you can solve the above issue.
Resolution
The following commands (in order) should resolve the issue:
- `helm ls -A` to identify which rke2 deployed helm chart is not in a deployed state
- `helm -n kube-system history rke2-ingress-nginx` to view the release history for the affected chart (in this example the rke2-ingress-nginx chart in the kube-system Namespace).
NOTE: for affected charts the most recent revision will be in a non-deployed status, e.g. the output below where it indicates the chart is uninstalling and deletion is in progress. This example shows that revision number 5 is stuck and not deployed properly
REVISION UPDATED STATUS CHART APP VERSION DESCRIPTION
5 Thu Oct 3 15:32:44 2024 uninstalling rke2-ingress-nginx-4.10.401 1.10.4 Deletion in progress (or silently failed)
- `kubectl get secrets -n kube-system | grep rke2-ingress-nginx`
NOTE: every X version will have a secret name that looks like: sh.helm.release.v1.rke2-ingress-nginx.v X
Following the example above, the name should be: sh.helm.release.v1.rke2-ingress-nginx.v5
Delete that secret
- `kubectl delete secrets -n kube-system sh.helm.release.v1.rke2-ingress-nginx.v5` To delete the affected helm release secret
- `kubectl delete pods -n kube-system helm-install-rke2-ingress-nginx-xxxxx` To delete the failed helm Job pod
The last command will delete the existing helm Job pod in an error state (with CrashLoopBackoff). After the pod deletion, a new Job pod will be scheduled and should run correctly (following the previous helm release secret deletion).
Cause
Helm deploys a version with a revision number for every component (e.g. rke2-ingress-nginx or rke2-metrics-server) in an RKE2 cluster in a certain namespace.
Whenever these components get upgraded, helm creates a new secret to indicate that a new release/version has been installed/rolled out.
Deleting the secret in question that reports the error message can unblock the situation and get the component upgrade to deploy successfully.
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.