Tuning for nodes with a high number of CPUs allocated
This document (000020731) is provided subject to the disclaimer at the end of this document.
Environment
An RKE2 or k3s cluster provisioned by Rancher or as a stand-alone cluster
Situation
Some components in a Kubernetes cluster apply a linear scaling mechanism, often based on the number of CPU cores allocated.
For nodes that have a high number CPU cores allocated, the default scaling curve can be too steep and can introduce issues.
Two common components are known to scale in this way: kube-proxy and ingress-nginx. However, additional workloads (such as nginx) may be deployed to the cluster and also need consideration.
Adjusting the scaling for these components can be applied proactively or if there are indications of an issue.
Resolution
kube-proxy
As explained in the Kubernetes GitHub issue here, the default scaling of the conntrack-max setting allocates 32K of memory per CPU core.
On a node with a high number of CPU cores this can present events like the below in OS logs:
kernel: nf_conntrack: falling back to vmalloc.
This static default can present issues with contiguous memory being allocated for the conntrack table, or reach unnecessary levels of space allocated. When observed frequently, this has been associated with network instability.
As a starting point, the suggestion is to halve this amount for a cluster with affected nodes:
RKE2
Add the kube-proxy-arg parameter to the cluster configuration.
Provisioned by Rancher
Click on Edit YAML in Cluster Management to make the change to a cluster, locate the machineGlobalConfig, and add the parameter, using the below as an example.
rkeConfig:
machineGlobalConfig:
[ ... ]
kube-proxy-arg:
- conntrack-max-per-core=16384
Note: The change will be applied to all nodes in the cluster. The change can also be added only to certain nodes, by instead adding the parameter within a machineSelectorConfig, more details are available in this documentation link
Standalone cluster
Edit the /etc/rancher/rke2/config.yaml file on each cluster node where the change is desired. Add the parameter and perform a restart of the appropriate rke2 service.
kube-proxy-arg:
- conntrack-max-per-core=16384
k3s
Add the kube-proxy-arg parameter to the cluster configuration.
Provisioned by Rancher
Click on Edit YAML in Cluster Management to make the change to a cluster, locate the machineGlobalConfig, and add the parameter, using the below as an example.
rkeConfig:
machineGlobalConfig:
[ ... ]
kube-proxy-arg:
- conntrack-max-per-core=16384
Note: The change will be applied to all nodes in the cluster. The change can also be added only to certain nodes, by instead adding the parameter within a machineSelectorConfig, more details are available in this documentation link
Standalone cluster
Edit the /etc/rancher/k3s/config.yaml file on each cluster node where the change is desired. Add the parameter and perform a restart of the appropriate k3s service.
kube-proxy-arg:
- conntrack-max-per-core=16384
RKE1
This can be done by editing the cluster as YAML, or the cluster.yml file when using the RKE CLI.
services:
kubeproxy:
extra_args:
conntrack-max-per-core: '16384'
Note: When using the RKE CLI, an rke up command will put the changes into effect.
ingress-nginx
Modern versions of ingress-nginx perform detection to determine the worker_processes based on the number of logical CPUs allocated to the node, or the CPU resource limit amount. By default CPU resource limits are not used and should generally be used only if unless necessary.
On a node with a high number of CPUs allocated this can result in an undesirable number of PIDs and open files with the threads consumed (number of cores * 32 (default thread_pool size)).
- http://nginx.org/en/docs/ngx_core_module.html#worker_processes
- http://nginx.org/en/docs/ngx_core_module.html#thread_pool
RKE2
Provisioned by Rancher
Add a HelmChartConfig to adjust the ingress-nginx helm chart value, click on Edit Config > Additional Manifest to add the HelmChartConfig in Cluster Management. An example of 8 worker_processes is used below.
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: rke2-ingress-nginx
namespace: kube-system
spec:
valuesContent: |-
controller:
config:
worker-processes: "8"
Standalone cluster
Add the HelmChartConfig as a YAML file to the /var/lib/rancher/rke2/server/manifests directory on all rke2-server nodes.
RKE1
This can be adjusted by editing the cluster as YAML, or the cluster.yml file when using the RKE CLI. An example of 8 worker_processes is used below. For a nodes that may process a high amount of ingress traffic, you may wish to use a higher number.
ingress:
provider: nginx
options:
worker-processes: "8"
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.