Skip to content

system-reserved and kube-reserved resource reservations

This document (000021853) is provided subject to the disclaimer at the end of this document.

Environment

  • SUSE Rancher 2.x
  • Downstream or Standalone/Custom RKE1/RKE2/K3s cluster

Situation

  • Kubernetes nodes can be scheduled to Capacity. Pods can consume all the available capacity on a node by default. This is an issue because nodes typically run quite a few system daemons that power the OS and Kubernetes itself. Unless resources are set aside for these system daemons, pods and system daemons compete for resources and lead to resource starvation issues on the node.
  • The scheduler treats 'Allocatable' as available capacity for pods
  • The kubelet exposes a feature named ' Node Allocatable' that helps to reserve compute resources for system daemons. Kubernetes recommends that cluster administrators configure 'Node Allocatable' based on their workload density on each node.
  • ' Allocatable' on a Kubernetes node is defined as the amount of compute resources that are available for pods (Allocatable = Total node capacity - kube-reserved - system-reserved). . The scheduler does not over-subscribe 'Allocatable'. 'CPU', 'memory', and 'ephemeral-storage' are supported values.

Kube Reserved:

  • KubeletConfiguration Setting: kubeReserved: {}. Example value {cpu: 100m, memory: 100Mi, ephemeral-storage: 1Gi, pid=1000} ``
  • kubeReserved is meant to capture resource reservation for Kubernetes system daemons like the kubelet, container runtime, etc.
  • In addition to cpu, memory, and ephemeral-storage, pid may be specified to reserve the specified number of process IDs for Kubernetes system daemons.

System Reserved

  • KubeletConfiguration Setting: systemReserved: {}. Example value {cpu: 100m, memory: 100Mi, ephemeral-storage: 1Gi, pid=1000} ``
  • systemReserved is meant to capture resource reservation for OS system daemons like sshd, udev, etc. systemReserved should reserve memory for the kernel too since kernel memory is not accounted to pods in Kubernetes at this time. Reserving resources for user login sessions is also recommended ( user.slice in systemd world).
  • In addition to cpu, memory, and ephemeral-storage, pid may be specified to reserve the specified number of process IDs for OS system daemons.
  • Be careful while enforcing system-reserved reservation, since it can lead to critical system services being CPU starved, OOM killed, or unable to fork on the node. The recommendation is to enforce system-reserved only if a user has profiled their nodes exhaustively to come up with precise estimates and is confident in their ability to recover if any process in that group is oom-killed

Resolution

How to configure system-reserved and kube-reserved reservations:

Rancher-provisioned cluster:

  • RKE2 cluster: In the Rancher UI go to Cluster management -> Select your RKE2 cluster -> Edit Config -> Advanced, then specify the parameters as 'Additional Kubelet Args'. Add the following lines (sample), click on Save to apply.

kube-reserved=cpu=1,memory=2Gi,ephemeral-storage=5Gi

system-reserved=cpu=1,memory=1548Mi,ephemeral-storage=30Gi

  • K3s cluster:  In the Rancher UI go to Cluster management -> Select your K3s cluster -> Edit Config -> Advanced, then specify the parameters as 'Additional Kubelet Args'. Add the following lines ( these are a sample), click on Save to apply.

kube-reserved=cpu=1,memory=2Gi,ephemeral-storage=5Gi

system-reserved=cpu=1,memory=1548Mi,ephemeral-storage=30Gi

  • RKE1 cluster: In the Rancher UI go to Cluster management -> Select your RKE cluster -> Edit Config -> Edit as YAML, and under the 'Services' -> 'Kubelet' -> 'extra_args',  add the following lines (these are a sample), click on Save to apply.
    kubelet:
      extra_args:
        kube-reserved: "cpu=1,memory=1Gi,ephemeral-storage=1Gi"
        system-reserved: "cpu=500m,memory=1Gi,ephemeral-storage=1Gi"

Standalone clusters:

  • RKE2 cluster: Specify the system-reserved and kube-reserved parameters under the configuration file available in the path /etc/rancher/rke2/config.yaml. Here is a sample /etc/rancher/rke2/config.yaml config for a worker (agent) node:
token:  rASpGladPp
node-name: wk1.cluster.local
node-ip: 10.10.24.1
cluster-domain: cluster.local
tls-san:
  - cluster.local
cluster-cidr: 10.10.16.0/21
service-cidr: 10.10.0.0/20
cluster-dns: 10.10.0.10
service-node-port-range: 30000-32767
kube-apiserver-arg:
  - request-timeout=2m
kubelet-arg:
  - kube-reserved=cpu=1,memory=2Gi,ephemeral-storage=5Gi
  - system-reserved=cpu=1,memory=1548Mi,ephemeral-storage=30Gi
  - eviction-hard=memory.available<500Mi,nodefs.available<4%
cni: calico
  • K3s cluster:  Specify the system-reserved and kube-reserved settings under the configuration file available in /etc/rancher/h3s/config.yaml. Here is a sample /etc/rancher/k3s/config.yaml config for a worker(agent) node.

token: xxxxxxxxx cluster-domain: k3s.local tls-san: - k3s.local kube-apiserver-arg: - request-timeout=2m kubelet-arg: - kube-reserved=cpu=1,memory=2Gi,ephemeral-storage=5Gi - system-reserved=cpu=1,memory=1548Mi,ephemeral-storage=30Gi - eviction-hard=memory.available<500Mi,nodefs.available<4%

- **RKE1 cluster**:  Speficy the system-reserved and kube-reserved settings as 'extra\_args' for the kubelet in the cluster configuration file 'cluster.yml'. Here is a sample cluster.yml config file:
nodes: - address: 192.168.100.41 internal_address: 192.168.100.41 user: root role: [controlplane, etcd] - address: 192.168.100.42 internal_address: 192.168.100.42 user: root role: [worker] - address: 192.168.100.43 internal_address: 192.168.100.43 user: root role: [worker] - address: 192.168.100.44 internal_address: 192.168.100.44 user: root role: [worker] cluster_name: rke-sample kubernetes_version: v1.26.15-rancher1-1 services: etcd: backup_config: interval_hours: 6 retention: 30 kube-api: service_cluster_ip_range: 10.41.0.0/16 kube-controller: cluster_cidr: 10.40.0.0/16 service_cluster_ip_range: 10.41.0.0/16 kubelet: cluster_dns_server: 10.41.0.10 extra_args: enforce-node-allocatable: "pods,kube-reserved,system-reserved" kube-reserved: "cpu=1,memory=1Gi,ephemeral-storage=1Gi" system-reserved: "cpu=500m,memory=1Gi,ephemeral-storage=1Gi" kube-reserved-cgroup: /kube.slice system-reserved-cgroup: /system.slice eviction-hard: "memory.available<500Mi,imagefs.available<10%,nodefs.available<10%,nodefs.inodesFree<5%"

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.