kube-proxy not upgraded on some nodes during RKE2 version upgrade in RKE2 <1.27.12, <1.28.8 and <1.29.4
This document (000021284) is provided subject to the disclaimer at the end of this document.
Environment
An RKE2 cluster with a version <1.27.12, <1.28.8 and <1.29.4
Situation
As a result of a bug affecting RKE2 clusters with a version <1.27.12, <1.28.8 and <1.29.4, during an upgrade the kube-proxy containers are not correctly upgraded and remain on the pre-upgrade version. Errors of the following format are observed in the kubelet logs:
"Unable to attach or mount volumes for pod; skipping pod" err="unmounted volumes=[file0 file1 file2 file3], unattached volumes=[file0 file1 file2 file3]: timed out waiting for the condition" pod="kube-system/kube-proxy-test-00865632-02"
Resolution
The issue can be resolved by upgrading the affected RKE2 cluster to 1.27.12, 1.28.8, 1.29.4 or above.
A workaround is also available for affected nodes:
- Open an SSH shell on the affected node
- Move the kube-proxy.yaml manifest out of the static pod folder. You might need to change the path if you are using a non-default static pod folder:
mv /var/lib/rancher/rke2/agent/pod-manifests/kube-proxy.yaml /var/lib/rancher/rke2/agent/kube-proxy.yaml_backup
- Restart the rke2-agent service:
systemctl restart rke2-agent
Cause
This bug was tracked in https://github.com/rancher/rke2/issues/4864 and fixed in RKE2 1.27.12, 1.28.8, 1.29.4 and above.
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.