system-reserved and kube-reserved resource reservations
This document (000021853) is provided subject to the disclaimer at the end of this document.
Environment
- SUSE Rancher 2.x
- Downstream or Standalone/Custom RKE1/RKE2/K3s cluster
Situation
- Kubernetes nodes can be scheduled to
Capacity
. Pods can consume all the available capacity on a node by default. This is an issue because nodes typically run quite a few system daemons that power the OS and Kubernetes itself. Unless resources are set aside for these system daemons, pods and system daemons compete for resources and lead to resource starvation issues on the node. - The scheduler treats 'Allocatable' as available
capacity
for pods - The
kubelet
exposes a feature named ' Node Allocatable' that helps to reserve compute resources for system daemons. Kubernetes recommends that cluster administrators configure 'Node Allocatable' based on their workload density on each node. - ' Allocatable' on a Kubernetes node is defined as the amount of compute resources that are available for pods (Allocatable = Total node capacity - kube-reserved - system-reserved). . The scheduler does not over-subscribe 'Allocatable'. 'CPU', 'memory', and 'ephemeral-storage' are supported values.
Kube Reserved:
- KubeletConfiguration Setting:
kubeReserved: {}
. Example value{cpu: 100m, memory: 100Mi, ephemeral-storage: 1Gi, pid=1000}
`` kubeReserved
is meant to capture resource reservation for Kubernetes system daemons like thekubelet
,container runtime
, etc.- In addition to
cpu
,memory
, andephemeral-storage
,pid
may be specified to reserve the specified number of process IDs for Kubernetes system daemons.
System Reserved
- KubeletConfiguration Setting:
systemReserved: {}
. Example value{cpu: 100m, memory: 100Mi, ephemeral-storage: 1Gi, pid=1000}
`` systemReserved
is meant to capture resource reservation for OS system daemons likesshd
,udev
, etc.systemReserved
should reservememory
for thekernel
too sincekernel
memory is not accounted to pods in Kubernetes at this time. Reserving resources for user login sessions is also recommended (user.slice
in systemd world).- In addition to
cpu
,memory
, andephemeral-storage
,pid
may be specified to reserve the specified number of process IDs for OS system daemons. - Be careful while enforcing
system-reserved
reservation, since it can lead to critical system services being CPU starved, OOM killed, or unable to fork on the node. The recommendation is to enforcesystem-reserved
only if a user has profiled their nodes exhaustively to come up with precise estimates and is confident in their ability to recover if any process in that group is oom-killed
Resolution
How to configure system-reserved and kube-reserved reservations:
Rancher-provisioned cluster:
- RKE2 cluster: In the Rancher UI go to Cluster management -> Select your RKE2 cluster -> Edit Config -> Advanced, then specify the parameters as 'Additional Kubelet Args'. Add the following lines (sample), click on Save to apply.
kube-reserved=cpu=1,memory=2Gi,ephemeral-storage=5Gi
system-reserved=cpu=1,memory=1548Mi,ephemeral-storage=30Gi
- K3s cluster: In the Rancher UI go to Cluster management -> Select your K3s cluster -> Edit Config -> Advanced, then specify the parameters as 'Additional Kubelet Args'. Add the following lines ( these are a sample), click on Save to apply.
kube-reserved=cpu=1,memory=2Gi,ephemeral-storage=5Gi
system-reserved=cpu=1,memory=1548Mi,ephemeral-storage=30Gi
- RKE1 cluster: In the Rancher UI go to Cluster management -> Select your RKE cluster -> Edit Config -> Edit as YAML, and under the 'Services' -> 'Kubelet' -> 'extra_args', add the following lines (these are a sample), click on Save to apply.
kubelet:
extra_args:
kube-reserved: "cpu=1,memory=1Gi,ephemeral-storage=1Gi"
system-reserved: "cpu=500m,memory=1Gi,ephemeral-storage=1Gi"
Standalone clusters:
- RKE2 cluster: Specify the system-reserved and kube-reserved parameters under the configuration file available in the path /etc/rancher/rke2/config.yaml. Here is a sample /etc/rancher/rke2/config.yaml config for a worker (agent) node:
token: rASpGladPp
node-name: wk1.cluster.local
node-ip: 10.10.24.1
cluster-domain: cluster.local
tls-san:
- cluster.local
cluster-cidr: 10.10.16.0/21
service-cidr: 10.10.0.0/20
cluster-dns: 10.10.0.10
service-node-port-range: 30000-32767
kube-apiserver-arg:
- request-timeout=2m
kubelet-arg:
- kube-reserved=cpu=1,memory=2Gi,ephemeral-storage=5Gi
- system-reserved=cpu=1,memory=1548Mi,ephemeral-storage=30Gi
- eviction-hard=memory.available<500Mi,nodefs.available<4%
cni: calico
- K3s cluster: Specify the system-reserved and kube-reserved settings under the configuration file available in /etc/rancher/h3s/config.yaml. Here is a sample /etc/rancher/k3s/config.yaml config for a worker(agent) node.
token: xxxxxxxxx
cluster-domain: k3s.local
tls-san:
- k3s.local
kube-apiserver-arg:
- request-timeout=2m
kubelet-arg:
- kube-reserved=cpu=1,memory=2Gi,ephemeral-storage=5Gi
- system-reserved=cpu=1,memory=1548Mi,ephemeral-storage=30Gi
- eviction-hard=memory.available<500Mi,nodefs.available<4%
- **RKE1 cluster**: Speficy the system-reserved and kube-reserved settings as 'extra\_args' for the kubelet in the cluster configuration file 'cluster.yml'. Here is a sample cluster.yml config file:
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.