kube-proxy not upgraded on some nodes during RKE2 version upgrade in RKE2 <1.27.12, <1.28.8 and <1.29.4

This document (000021284) is provided subject to the disclaimer at the end of this document.

Environment

An RKE2 cluster with a version <1.27.12, <1.28.8 and <1.29.4

Situation

As a result of a bug affecting RKE2 clusters with a version <1.27.12, <1.28.8 and <1.29.4, during an upgrade the kube-proxy containers are not correctly upgraded and remain on the pre-upgrade version. Errors of the following format are observed in the kubelet logs:

"Unable to attach or mount volumes for pod; skipping pod" err="unmounted volumes=[file0 file1 file2 file3], unattached volumes=[file0 file1 file2 file3]: timed out waiting for the condition" pod="kube-system/kube-proxy-test-00865632-02"

Resolution

The issue can be resolved by upgrading the affected RKE2 cluster to 1.27.12, 1.28.8, 1.29.4 or above.

A workaround is also available for affected nodes:

Open an SSH shell on the affected node
Move the kube-proxy.yaml manifest out of the static pod folder. You might need to change the path if you are using a non-default static pod folder:

mv /var/lib/rancher/rke2/agent/pod-manifests/kube-proxy.yaml /var/lib/rancher/rke2/agent/kube-proxy.yaml_backup

Restart the rke2-agent service:

systemctl restart rke2-agent

Cause

This bug was tracked in https://github.com/rancher/rke2/issues/4864 and fixed in RKE2 1.27.12, 1.28.8, 1.29.4 and above.

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.