Skip to content

Kube-proxy not upgrading correctly

This document (000021284) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Rancher 2.7.x

Situation

There is a bug affecting the upgrade of Downstream RKE2 clusters from v1.25.x to v1.26.x. The kube-proxy containers don't restart on the worker nodes, and they get stuck on the pre-upgrade version.

Resolution

A manual workaround is available to force the kube-proxy to upgrade to the correct version.

The following steps will help upgrade the kube-proxy in the worker nodes:

  1. Log in to the worker node with the non-updated kube-proxy through SSH.
  2. Move the kube-proxy.yaml manifest out of the static pod folder. You might need to change the route if you are using a non-default static pod folder.
mv /var/lib/rancher/rke2/agent/pod-manifests/kube-proxy.yaml /var/lib/rancher/rke2/agent/kube-proxy.yaml_backup
  1. Restart the rke2-agent service.
systemctl restart rke2-agent

Cause

There is a race condition between the Kubernetes API and the Kubelet. The mirror pod on the API comes back to the node before the new container is created by the Kubelet, so the kube-proxy doesn't upgrade. By restarting the rke2-agent service and moving out the static pod manifest, there is more time for the Kubelet to order containerd to create the pod, this way the mirror pod is delayed enough so the race condition doesn't happen.

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.