How to change the open files limits for the systemd service and components in kubernetes cluster?
Article Number: 000021920
Environment
- RKE2
- K3s
- RKE1
Situation
When observing errors like the below, containing "too many open files", the RKE1/RKE2/K3S node is likely experiencing service instability or Pod failures due to the exhaustion of the open file descriptor limit, often referred to as the nofiles or LimitNOFILE ulimit setting.
Common log errors and symptoms of this issue include:
- Networking failures: When the kubelet fails to set up the pod network, you may see an error like this:
Failed to create sandbox for pod" err="rpc error: code = Unknown desc = failed to setup network for sandbox ... plugin type="loopback" failed (add): netplugin failed with no error message: fork/exec /opt/cni/bin/loopback: too many open files"
- Ingress failures: The Ingress controller is unable to accept new connections, logging a fatal error such as:
35#35: accept4() failed (24: Too many open files)
- Container runtime failures: The container runtime itself cannot open a necessary file to manage the container lifecycle. This can manifest as an error like:
Status from runtime service failed" err="rpc error: code = Unknown desc = open /var/lib/rancher/rke2/agent/containerd/io.containerd.grpc.v1.introspection/uuid: too many open files"
Resolution
RKE2 and K3s
Note: the below steps reference the rke2-agent service, the same change can be adapted for the rke2-server (control plane) nodes, and k3s nodes by changing the service name appropriately in each step
- SSH into the worker node where the open file is to be increased
systemctl edit rke2-agent
- Modify the file
### Editing /etc/systemd/system/rke2-agent.service.d/override.conf
### Anything between here and the comment below will become the new contents of the file
[Service]
LimitNOFILE=xxxx # use your value instead of xxx
Once you save, this will create an override.conf file with new setting under /etc/systemd/system/rke2-agent.service.d/
Note: Please be sure to place the above configuration between the top/bottom comment lines in the file to take effect when editing
Also, if the LimitNOFILE value used for the rke2-agent service is greater than the OS nofiles ulimit, the increase may not take effect. In this case, increase the OS nofiles ulimit to at least the desired value configured for rke2-agent.
- Restart rke2-agent
sudo systemctl restart rke2-agent.service
Please create a new pod on this node for testing and execute it to view the ulimit -n value, the updated value in systemd can also be verified with a command like: systemctl show rke2-agent.service | grep LimitNOFILE
RKE1
The steps below are for RKE1, the same steps apply for control plane node/etcd node and worker nodes.
There are two approaches (systemd and Docker) which change the scope of the ulimit change, check each option to decide which fits the use case.
Systemd
By default, systemd ignores ulimits defined in /etc/security/limits.conf
and applies its own limits. These can be configured either system-wide or per service.
System-wide configuration
- Edit
/etc/systemd/system.conf
and set:
DefaultLimitNOFILE=65535:65535
Per-service configuration
To configure for a specific service (e.g., Docker), either edit the service file or create an override file:
- Add an override file for the Docker service
/etc/systemd/system/docker.service.d/override.conf
:
[Service]
LimitNOFILE=infinity
Docker Configuration
You can also set container-level limits using the Docker daemon configuration.
- Edit the Docker configuration to set the default nofiles ulimit for all containers
/etc/docker/daemon.json
:
{
"default-ulimits": {
"nofile": {
"Name": "nofile",
"Soft": 64000,
"Hard": 64000
}
}
}
- Restart Docker after making changes:
systemctl daemon-reload
systemctl restart docker