Skip to content

How to enable and query the supervisor loadbalancer metrics in an RKE2 or K3s cluster

This document (000021744) is provided subject to the disclaimer at the end of this document.

Environment

A Rancher-provisioned or standalone RKE2 or K3s cluster; running RKE2 v1.29.12+rke2r1, v1.30.8+rke2r1, v1.31.4+rke2r1 or above; or K3s v1.29.12+k3s1, v1.30.8+k3s1, v1.31.4+k3s1 or above

Situation

The supervisor loadbalancer in RKE2 and K3s clusters loadbalances kube-apiserver, etcd and supervisor traffic between cluster nodes. More information on the supervisor process and load-balancing can be found in the documentation here. This article provides instructions on how to query metrics for this loadbalancer process.

Resolution

The presence of the loadbalancer depends upon the node's roles, as below:

  • An all role, or controlplane and etcd role server node will have no loadbalancers, since it has everything locally.
  • A control-plane role only server node will have a loadbalancer to connect to etcd nodes.
  • An etcd role only server node will have a loadbalancer to connect to control-plane nodes.
  • An agent (worker role only) node will have a loadbalancer to connect to control-plane nodes.

The command for querying the loadbalancer metrics depends upon the cluster and node type, as detailed in the following instructions.

RKE2

In an RKE2 cluster the supervisor's loadbalancer metrics are exposed on nodes via the supervisor port (9345).

To query the loadbalancer metrics in an RKE2 cluster:

  1. Enable the supervisor metrics in the RKE2 configuration:
  2. For a standalone RKE2 cluster node: Edit the configuration file on the node under /etc/rancher/rke2/config.yaml, add supervisor-metrics: true and restart the rke2-server/rke2-agent process, depending upon the node type (`systemctl restart rke2-server` or `systemctl restart rke2-agent`)

    ## sample /etc/rancher/rke2/config.yaml
    [...]
    supervisor-metrics: true
    [...]
    

    ``

  3. For a Rancher-provisioned RKE2 cluster: in the Cluster Management interface of the Rancher UI, click Edit YAML for the applicable cluster, add the configuration " supervisor-metrics: true" into the machineGlobalConfig block, and click Save

    [...]
        machineGlobalConfig:
          supervisor-metrics: true
    [...]
    
    2. Check the metrics are enabled: On an all role or control-plane role only node in the cluster (not via the Rancher-proxied Kubernetes API endpoint or Authorized Cluster Endpoint), query the metrics per the example below. Replace with the IP of the node you wish to query. N.B. The node from which you are running the command will need to be able to reach port 9345 on the node you are querying.

export KUBECONFIG=/etc/rancher/rke2/rke2.yaml
alias kubectl=/var/lib/rancher/rke2/bin/kubectl
kubectl get --server https://<node-ip>:9345 --raw /metrics | grep load
# HELP rke2_loadbalancer_dial_duration_seconds Time taken to dial a connection to a backend server
# TYPE rke2_loadbalancer_dial_duration_seconds histogram
rke2_loadbalancer_dial_duration_seconds_bucket{name="rke2-etcd-server-load-balancer",status="success",le="0.001"} 13
rke2_loadbalancer_dial_duration_seconds_bucket{name="rke2-etcd-server-load-balancer",status="success",le="0.002"} 22
rke2_loadbalancer_dial_duration_seconds_bucket{name="rke2-etcd-server-load-balancer",status="success",le="0.004"} 51
rke2_loadbalancer_dial_duration_seconds_bucket{name="rke2-etcd-server-load-balancer",status="success",le="0.008"} 79
rke2_loadbalancer_dial_duration_seconds_bucket{name="rke2-etcd-server-load-balancer",status="success",le="0.016"} 123
rke2_loadbalancer_dial_duration_seconds_bucket{name="rke2-etcd-server-load-balancer",status="success",le="0.032"} 154
rke2_loadbalancer_dial_duration_seconds_bucket{name="rke2-etcd-server-load-balancer",status="success",le="0.064"} 163
rke2_loadbalancer_dial_duration_seconds_bucket{name="rke2-etcd-server-load-balancer",status="success",le="0.128"} 163
rke2_loadbalancer_dial_duration_seconds_bucket{name="rke2-etcd-server-load-balancer",status="success",le="0.256"} 163
rke2_loadbalancer_dial_duration_seconds_bucket{name="rke2-etcd-server-load-balancer",status="success",le="0.512"} 163
rke2_loadbalancer_dial_duration_seconds_bucket{name="rke2-etcd-server-load-balancer",status="success",le="1.024"} 163
rke2_loadbalancer_dial_duration_seconds_bucket{name="rke2-etcd-server-load-balancer",status="success",le="2.048"} 163
rke2_loadbalancer_dial_duration_seconds_bucket{name="rke2-etcd-server-load-balancer",status="success",le="4.096"} 163
rke2_loadbalancer_dial_duration_seconds_bucket{name="rke2-etcd-server-load-balancer",status="success",le="8.192"} 163
rke2_loadbalancer_dial_duration_seconds_bucket{name="rke2-etcd-server-load-balancer",status="success",le="16.384"} 163
rke2_loadbalancer_dial_duration_seconds_bucket{name="rke2-etcd-server-load-balancer",status="success",le="+Inf"} 163
rke2_loadbalancer_dial_duration_seconds_sum{name="rke2-etcd-server-load-balancer",status="success"} 1.7390672159999991
rke2_loadbalancer_dial_duration_seconds_count{name="rke2-etcd-server-load-balancer",status="success"} 163
# HELP rke2_loadbalancer_server_connections Count of current connections to loadbalancer server
# TYPE rke2_loadbalancer_server_connections gauge
rke2_loadbalancer_server_connections{name="rke2-etcd-server-load-balancer",server="24.199.104.66:2379"} 0
rke2_loadbalancer_server_connections{name="rke2-etcd-server-load-balancer",server="24.199.96.251:2379"} 0
rke2_loadbalancer_server_connections{name="rke2-etcd-server-load-balancer",server="64.23.213.163:2379"} 153
# HELP rke2_loadbalancer_server_health Current health value of loadbalancer server
# TYPE rke2_loadbalancer_server_health gauge
rke2_loadbalancer_server_health{name="rke2-etcd-server-load-balancer",server="24.199.104.66:2379"} 5
rke2_loadbalancer_server_health{name="rke2-etcd-server-load-balancer",server="24.199.96.251:2379"} 5
rke2_loadbalancer_server_health{name="rke2-etcd-server-load-balancer",server="64.23.213.163:2379"} 7

K3s

In a K3s cluster the loadbalancer metrics are exposed on agent (worker role only) nodes via the kubelet metrics port (10250) and on server nodes via the kube-apiserver port (6443).

To query the loadbalancer metrics in a K3s cluster:

  1. Enable the supervisor metrics in the K3s configuration:
  2. For a standalone K3s cluster node: Edit the configuration file on the node under /etc/rancher/k3s/config.yaml, add supervisor-metrics: true and restart the K3s service (`systemctl restart k3s`)

    ## sample /etc/rancher/k3s/config.yaml
    [...]
    supervisor-metrics: true
    [...]
    
  3. For a Rancher-provisioned K3s cluster: in the Cluster Management interface of the Rancher UI, click Edit YAML for the applicable cluster, add the configuration " supervisor-metrics: true" into the machineGlobalConfig block, and click Save

    [...]
        machineGlobalConfig:
          supervisor-metrics: true
    [...]
    
    2. Check the metrics are enabled: On an a server node in the cluster (not via the Rancher-proxied Kubernetes API endpoint or Authorized Cluster Endpoint), query the metrics per the example below. Replace with the IP of the node you wish to query and with 10250 if querying an agent node, or 6443 if querying a server node. N.B. The node from which you are running the command will need to be able to reach on the node you are querying.

kubectl get --server https://<node-ip>:<port> --raw /metrics |grep -i k3s_load
# HELP k3s_loadbalancer_dial_duration_seconds Time taken to dial a connection to a backend server
# TYPE k3s_loadbalancer_dial_duration_seconds histogram
k3s_loadbalancer_dial_duration_seconds_bucket{name="k3s-etcd-server-load-balancer",status="success",le="0.001"} 218
k3s_loadbalancer_dial_duration_seconds_bucket{name="k3s-etcd-server-load-balancer",status="success",le="0.002"} 239
k3s_loadbalancer_dial_duration_seconds_bucket{name="k3s-etcd-server-load-balancer",status="success",le="0.004"} 253
k3s_loadbalancer_dial_duration_seconds_bucket{name="k3s-etcd-server-load-balancer",status="success",le="0.008"} 264
k3s_loadbalancer_dial_duration_seconds_bucket{name="k3s-etcd-server-load-balancer",status="success",le="0.016"} 278
k3s_loadbalancer_dial_duration_seconds_bucket{name="k3s-etcd-server-load-balancer",status="success",le="0.032"} 290
k3s_loadbalancer_dial_duration_seconds_bucket{name="k3s-etcd-server-load-balancer",status="success",le="0.064"} 293
k3s_loadbalancer_dial_duration_seconds_bucket{name="k3s-etcd-server-load-balancer",status="success",le="0.128"} 294
k3s_loadbalancer_dial_duration_seconds_bucket{name="k3s-etcd-server-load-balancer",status="success",le="0.256"} 294
k3s_loadbalancer_dial_duration_seconds_bucket{name="k3s-etcd-server-load-balancer",status="success",le="0.512"} 294
k3s_loadbalancer_dial_duration_seconds_bucket{name="k3s-etcd-server-load-balancer",status="success",le="1.024"} 294
k3s_loadbalancer_dial_duration_seconds_bucket{name="k3s-etcd-server-load-balancer",status="success",le="2.048"} 294
k3s_loadbalancer_dial_duration_seconds_bucket{name="k3s-etcd-server-load-balancer",status="success",le="4.096"} 294
k3s_loadbalancer_dial_duration_seconds_bucket{name="k3s-etcd-server-load-balancer",status="success",le="8.192"} 294
k3s_loadbalancer_dial_duration_seconds_bucket{name="k3s-etcd-server-load-balancer",status="success",le="16.384"} 294
k3s_loadbalancer_dial_duration_seconds_bucket{name="k3s-etcd-server-load-balancer",status="success",le="+Inf"} 294
k3s_loadbalancer_dial_duration_seconds_sum{name="k3s-etcd-server-load-balancer",status="success"} 0.8818124119999996
k3s_loadbalancer_dial_duration_seconds_count{name="k3s-etcd-server-load-balancer",status="success"} 294
# HELP k3s_loadbalancer_server_connections Count of current connections to loadbalancer server
# TYPE k3s_loadbalancer_server_connections gauge
k3s_loadbalancer_server_connections{name="k3s-etcd-server-load-balancer",server="164.92.125.58:2379"} 102
# HELP k3s_loadbalancer_server_health Current health value of loadbalancer server
# TYPE k3s_loadbalancer_server_health gauge
k3s_loadbalancer_server_health{name="k3s-etcd-server-load-balancer",server="164.92.125.58:2379"} 7

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.