Skip to content

rke2 helm-install job failing with INSTALLATION FAILED: cannot re-use a name that is still in use

This document (000021581) is provided subject to the disclaimer at the end of this document.

Environment

  • Rancher v2.7+
  • RKE2 v1.26+

Situation

During an upgrade of an RKE2 cluster, you may face issues related to helm-install job(s) upgrading internal components such as rke2-ingress-nginx or rke2-metrics-server.

By checking the related helm install Job pod logs you can see the following error message:

Error: INSTALLATION FAILED: cannot re-use a name that is still in use

This situation can occur as the result of a previously failed update to or removal of the component.

This KB describes how you can solve the above issue.

Resolution

The following commands (in order) should resolve the issue:

- `helm ls -A` to identify which rke2 deployed helm chart is not in a deployed state

- `helm -n kube-system history rke2-ingress-nginx` to view the release history for the affected chart (in this example the rke2-ingress-nginx chart in the kube-system Namespace).

NOTE: for affected charts the most recent revision will be in a non-deployed status, e.g. the output below where it indicates the chart is uninstalling and deletion is in progress. This example shows that revision number 5 is stuck and not deployed properly

REVISION    UPDATED                     STATUS          CHART                           APP VERSION DESCRIPTION
5           Thu Oct  3 15:32:44 2024    uninstalling    rke2-ingress-nginx-4.10.401 1.10.4          Deletion in progress (or silently failed)

- `kubectl get secrets -n kube-system | grep rke2-ingress-nginx`

NOTE: every X version will have a secret name that looks like: sh.helm.release.v1.rke2-ingress-nginx.v X

Following the example above, the name should be: sh.helm.release.v1.rke2-ingress-nginx.v5

Delete that secret

- `kubectl delete secrets -n kube-system sh.helm.release.v1.rke2-ingress-nginx.v5` To delete the affected helm release secret

- `kubectl delete pods -n kube-system helm-install-rke2-ingress-nginx-xxxxx` To delete the failed helm Job pod

The last command will delete the existing helm Job pod in an error state (with CrashLoopBackoff). After the pod deletion, a new Job pod will be scheduled and should run correctly (following the previous helm release secret deletion).

Cause

Helm deploys a version with a revision number for every component (e.g. rke2-ingress-nginx or rke2-metrics-server) in an RKE2 cluster in a certain namespace.

Whenever these components get upgraded, helm creates a new secret to indicate that a new release/version has been installed/rolled out.

Deleting the secret in question that reports the error message can unblock the situation and get the component upgrade to deploy successfully.

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.