Tuning for nodes with a high number of CPUs allocated

Article Number: 000020731

Environment

An RKE2 or k3s cluster provisioned by Rancher or as a stand-alone cluster

Situation

Some components in a Kubernetes cluster apply a linear scaling mechanism, often based on the number of CPU cores allocated.

For nodes that have a high number CPU cores allocated, the default scaling curve can be too steep and can introduce issues.

Two common components are known to scale in this way: kube-proxy and ingress-nginx. However, additional workloads (such as nginx) may be deployed to the cluster and also need consideration.

Adjusting the scaling for these components can be applied proactively or if there are indications of an issue.

Resolution

kube-proxy

As explained in the Kubernetes GitHub issue here, the default scaling of the conntrack-max setting allocates 32K of memory per CPU core.

On a node with a high number of CPU cores this can present events like the below in OS logs:

kernel: nf_conntrack: falling back to vmalloc.

This static default can present issues with contiguous memory being allocated for the conntrack table, or reach unnecessary levels of space allocated. When observed frequently, this has been associated with network instability.

As a starting point, the suggestion is to halve this amount for a cluster with affected nodes:

RKE2

Add the kube-proxy-arg parameter to the cluster configuration.

Provisioned by Rancher

Click on Edit YAML in Cluster Management to make the change to a cluster, locate the machineGlobalConfig, and add the parameter, using the below as an example.

  rkeConfig:
    machineGlobalConfig:
        [ ... ]
      kube-proxy-arg:
        - conntrack-max-per-core=16384

Note: The change will be applied to all nodes in the cluster. The change can also be added only to certain nodes, by instead adding the parameter within a machineSelectorConfig, more details are available in this documentation link

Standalone cluster

Edit the /etc/rancher/rke2/config.yaml file on each cluster node where the change is desired. Add the parameter and perform a restart of the appropriate rke2 service.

kube-proxy-arg:
  - conntrack-max-per-core=16384

k3s

Add the kube-proxy-arg parameter to the cluster configuration.

Provisioned by Rancher

Click on Edit YAML in Cluster Management to make the change to a cluster, locate the machineGlobalConfig, and add the parameter, using the below as an example.

  rkeConfig:
    machineGlobalConfig:
        [ ... ]
      kube-proxy-arg:
        - conntrack-max-per-core=16384

Note: The change will be applied to all nodes in the cluster. The change can also be added only to certain nodes, by instead adding the parameter within a machineSelectorConfig, more details are available in this documentation link

Standalone cluster

Edit the /etc/rancher/k3s/config.yaml file on each cluster node where the change is desired. Add the parameter and perform a restart of the appropriate k3s service.

kube-proxy-arg:
  - conntrack-max-per-core=16384

RKE1

This can be done by editing the cluster as YAML, or the cluster.yml file when using the RKE CLI.

services:
  kubeproxy:
    extra_args:
      conntrack-max-per-core: '16384'

Note: When using the RKE CLI, an rke up command will put the changes into effect.

ingress-nginx

Modern versions of ingress-nginx perform detection to determine the worker_processes based on the number of logical CPUs allocated to the node, or the CPU resource limit amount. By default CPU resource limits are not used and should generally be used only if unless necessary.

On a node with a high number of CPUs allocated this can result in an undesirable number of PIDs and open files with the threads consumed (number of cores * 32 (default thread_pool size)).

RKE2

Provisioned by Rancher

Add a HelmChartConfig to adjust the ingress-nginx helm chart value, click on Edit Config > Additional Manifest to add the HelmChartConfig in Cluster Management. An example of 8 worker_processes is used below.

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: rke2-ingress-nginx
  namespace: kube-system
spec:
  valuesContent: |-
    controller:
      config:
        worker-processes: "8"

Standalone cluster

Add the HelmChartConfig as a YAML file to the /var/lib/rancher/rke2/server/manifests directory on all rke2-server nodes.

https://docs.rke2.io/networking/networking_services#nginx-ingress-controller

RKE1

This can be adjusted by editing the cluster as YAML, or the cluster.yml file when using the RKE CLI. An example of 8 worker_processes is used below. For a nodes that may process a high amount of ingress traffic, you may wish to use a higher number.

ingress:
  provider: nginx
  options:
    worker-processes: "8"