Cattle-cluster-agent flapping between versions
This document (000021769) is provided subject to the disclaimer at the end of this document.
Environment
Any Rancher-managed downstream cluster.
Situation
The cattle-cluster-agent is seen flapping between versions, appearing as pods restarting frequently, alternating between new and old versions. You will also notice that the cattle-cluster-agent deployment will have a high revision count (ex. deployment.kubernetes.io/revision: 31736 ) and you will see multiple replica set with active pods:
kubectl get -n cattle-system replicasets -o custom-columns=NAME:.metadata.name,READY_REPLICAS:.status.readyReplicas,TOTAL_REPLICAS:.status.replicas,IMAGE:.spec.template.spec.containers[*].image
NAME READY_REPLICAS TOTAL_REPLICAS IMAGE
cattle-cluster-agent-59854fg9cf 1 1 registry.rancher.com/rancher/rancher-agent:v2.10.1
cattle-cluster-agent-673596d4q5 1 2 registry.rancher.com/rancher/rancher-agent:v2.10.4
Resolution
Enable Audit Logs: Enable audit logs to trace which calls are triggering the update in your cluster [1]. This will help track which calls are updating the cattle-cluster-agent deployment and identify what is making those changes. Use the following REGEX string to find the updates:
deployments/cattle-cluster-agent.*patch.*<UNEXPECTED_RANCHER_VERSION>
Analyze the Logs: you will see results similar to the following:
"{""kind"":""Event"",""apiVersion"":""audit.k8s.io/v1"",""level"":""RequestResponse"",""auditID"":""2270172f-4d8e-4a71-b6d0-d8847d1159a5"",""stage"":""ResponseComplete"",""requestURI"":""/apis/apps/v1/namespaces/cattle-system/deployments/cattle-cluster-agent?fieldManager=kubectl-client-side-apply\u0026fieldValidation=Strict"",""verb"":""patch"",""user"":{""username"":""system:serviceaccount:cattle-impersonation-system:cattle-impersonation-u-ezg219c4qv"",""uid"":""1f59bb08-6b81-488a-al31-b3439235e3c6"",""groups"":[""system:serviceaccounts"",""system:serviceaccounts:cattle-impersonation-system"",""system:authenticated""]},""impersonatedUser"":{""username"":""u-ezg22196cxl"",""groups"":[""system:authenticated"",""system:cattle:authenticated""],""extra"":{""principalid"":[""system://c-r7x6s"",""local://u-ezg22196cxl""],""username"":[""System account for Cluster c-r7x6s""]}},""sourceIPs"":[""127.0.0.1"",""10.0.41.7""],""userAgent"":""kubectl/v1.28.6+k3s2 (linux/amd64) kubernetes/c9f49a3"",""objectRef"":{""resource"":""deployments"",""namespace"":""cattle-system"",""name"":""cattle-cluster-agent"",""apiGroup"":""apps"",""apiVersion"":""v1""},""responseStatus"":{""metadata"":{},""code"":200},""requestObject"":{""metadata"":{""annotations"":{""kubectl.kubernetes.io/last-applied-configuration"":""{\""apiVersion\"":\""apps/v1\"",\""kind\"":\""Deployment\"",\""metadata\"":{\""annotations\"":{\""management.cattle.io/scale-available\"":\""2\""},\""name\"":\""cattle-cluster-agent\"",\""namespace\"":\""cattle-system\""},\""spec\"":{\""selector\"":{\""matchLabels\"":{\""app\"":\""cattle-cluster-agent\""}},\""strategy\"":{\""rollingUpdate\"":{\""maxSurge\"":1,\""maxUnavailable\"":0},\""type\"":\""RollingUpdate\""},\""template\"":{\""metadata\"":{\""labels\"":{\""app\"":\""cattle-cluster-agent\""}},\""spec\"":{\""affinity\"":{\""nodeAffinity\"":{\""preferredDuringSchedulingIgnoredDuringExecution\"":[{\""preference\"":{\""matchExpressions\"":[{\""key\"":\""node-role.kubernetes.io/controlplane\"",\""operator\"":\""In\"",\""values\"":[\""true\""]}]},\""weight\"":100},{\""preference\"":{\""matchExpressions\"":[{\""key\"":\""node-role.kubernetes.io/control-plane\"",\""operator\"":\""In\"",\""values\"":[\""true\""]}]},\""weight\"":100},{\""preference\"":{\""matchExpressions\"":[{\""key\"":\""node-role.kubernetes.io/master\"",\""operator\"":\""In\"",\""values\"":[\""true\""]}]},\""weight\"":100},{\""preference\"":{\""matchExpressions\"":[{\""key\"":\""cattle.io/cluster-agent\"",\""operator\"":\""In\"",\""values\"":[\""true\""]}]},\""weight\"":1}],\""requiredDuringSchedulingIgnoredDuringExecution\"":{\""nodeSelectorTerms\"":[{\""matchExpressions\"":[{\""key\"":\""beta.kubernetes.io/os\"",\""operator\"":\""NotIn\"",\""values\"":[\""windows\""]}]}]}},\""podAntiAffinity\"":{\""preferredDuringSchedulingIgnoredDuringExecution\"":[{\""podAffinityTerm\"":{\""labelSelector\"":{\""matchExpressions\"":[{\""key\"":\""app\"",\""operator\"":\""In\"",\""values\"":[\""cattle-cluster-agent\""]}]},\""topologyKey\"":\""kubernetes.io/hostname\""},\""weight\"":100}]}},\""containers\"":[{\""env\"":[{\""name\"":\""CATTLE_FEATURES\"",\""value\"":\""embedded-cluster-api=false,fleet=false,monitoringv1=false,multi-cluster-management=false,multi-cluster-management-agent=true,provisioningv2=false,rke2=false\""},{\""name\"":\""CATTLE_IS_RKE\"",\""value\"":\""false\""},{\""name\"":\""CATTLE_SERVER\"",\""value\"":\""https://k8s.weg.net\""},{\""name\"":\""CATTLE_CA_CHECKSUM\"",\""value\"":\""REDACTED\""},{\""name\"":\""CATTLE_CLUSTER\"",\""value\"":\""true\""},{\""name\"":\""CATTLE_K8S_MANAGED\"",\""value\"":\""true\""},{\""name\"":\""CATTLE_CLUSTER_REGISTRY\"",\""value\"":\""registry.rancher.com\""},{\""name\"":\""CATTLE_SERVER_VERSION\"",\""value\"":\""v2.10.1\""},{\""name\"":\""CATTLE_INSTALL_UUID\"",\""value\"":\""REDACTED\""},{\""name\"":\""CATTLE_INGRESS_IP_DOMAIN\"",\""value\"":\""sslip.io\""}],\""image\"":\""registry.rancher.com/rancher/rancher-agent:v2.10.1\"",\""imagePullPolicy\"":\""IfNotPresent\"",\""name\"":\""cluster-register\"",\""volumeMounts\"":[{\""mountPath\"":\""/cattle-credentials\"",\""name\"":\""cattle-credentials\"",\""readOnly\"":true}]}],\""serviceAccountName\"":\""cattle\"",\""tolerations\"":[{\""effect\"":\""NoSchedule\"",\""key\"":\""node-role.kubernetes.io/controlplane\"",\""value\"":\""true\""},{\""effect\"":\""NoSchedule\"",\""key\"":\""node-role.kubernetes.io/control-plane\"",\""operator\"":\""Exists\""},{\""effect\"":\""NoSchedule\"",\""key\"":\""node-role.kubernetes.io/master\"",\""operator\"":\""Exists\""}],\""volumes\"":[{\""name\"":\""cattle-credentials\"",\""secret\"":{\""defaultMode\"":320,\""secretName\"":\""cattle-credentials-998877ccbbaa\""}}]}}}}
Identify the Source: In the example above, a patch request was made on the cattle-cluster-agent resource, updating the cattle-cluster-agent image version to 2.10.1. The source IP that initiated this call is shown as 10.0.41.7. This IP can be traced to identify the host responsible for making the call. If it is not a node that is a part of your local cluster where you made the update to 2.10.4, then that means that this downstream cluster is managed by another cluster as well. You will need to remove the association to the other cluster so that it is only managed by one Rancher instance.
Cause
If the cluster is being managed by two Rancher instances and they are running different versions of Rancher, the cattle-cluster-agent will constantly flip between the versions that each Rancher cluster expects.
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.