RKE2 vSphere cluster provisioning failing, with failing kube-apiserver healthchecks due to inability to resolve localhost

This document (000021462) is provided subject to the disclaimer at the end of this document.

Environment

Rancher 2.7.x

Rancher 2.8.x

Situation

Symptoms

High number restarts of the core pods

NAMESPACE      NAME                                           READY   STATUS     RESTARTS
cattle-fleet-system fleet-agent-cc8c97f97-bvx78               1/1    Running      185
cattle-system cattle-cluster-agent-b1460cbd-8ct5c             1/1    Running      115
cattle-system cattle-cluster-agent-b1460cbd-l2l8l             1/1    Running      168
kube-system kube-apiserver-cluster-suse-cp-f777105c-2qgvh     0/1    Running      314
kube-system kube-controller-manager-cluster-suse-cp-5c-2qgvh  1/1    Running      491
kube-system cloud-controller-manager-cluster-suse-cp-5c-2qgvh 1/1    Running      501

The apiserver pod flaps between ready and not ready status

NAMESPACE              NAME                                 READY   STATUS     RESTARTS

kube-system kube-apiserver-cluster-suse-cp-f777105c-2qgvh     0/1   Running    314

The kubelet logs register failing probes against the kube-apiserver.

Resolution

1) Enable kubelet debug logging

Click ☰ > Cluster Management.
Go to the cluster you want to configure and click ⋮ > Edit Config.
Advanced > Additional Kubelet Args > Add v=9 under For all machines, use Kubelet args

2) Replicate the livenenessProbe and check the kubelet logs

   2.1 Open an SSH session to a master node
   2.2 Execute the command to simulate livenessProbe for kube-apiserver in the cluster

/var/lib/rancher/rke2/bin/crictl --runtime-endpoint unix:///run/k3s/containerd/containerd.sock exec $(/var/lib/rancher/rke2/bin/crictl --runtime-endpoint unix:///run/k3s/containerd/containerd.sock ps | grep kube-apiserver | awk '{print $1}') kubectl get --server=https://localhost:6443/ --client-certificate=/var/lib/rancher/rke2/server/tls/client-kube-apiserver.crt --client-key=/var/lib/rancher/rke2/server/tls/client-kube-apiserver.key --certificate-authority=/var/lib/rancher/rke2/server/tls/server-ca.crt --raw=/livez

   2.3 Open another SSH sessionn to the same master node and check the logs

tail -f /var/lib/rancher/rke2/agent/logs/kubelet.log | grep kube-apiserver

 Check the DNS resolution and DNS lookups.

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.