kube-apiserver "socket: too many open files" error messages
This document (000020016) is provided subject to the disclaimer at the end of this document.
Situation
Issue
During normal operation of a Kubernetes cluster, you may experience intermittent stability issues and the kube-apiserver logs may contain messages of the following format:
clientconn.go:1208] grpc: addrConn.createTransport failed to connect to {https://x.x.x.x:2379 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp x.x.x.x:2379: socket: too many open files". Reconnecting...
clientconn.go:1208] grpc: addrConn.createTransport failed to connect to {https://x.x.x.x:2379 <nil> 0 <nil>}. Err :connection error: desc = "transport: authentication handshake failed: context canceled". Reconnecting...
clientconn.go:1208] grpc: addrConn.createTransport failed to connect to {https://x.x.x.x:2379 <nil> 0 <nil>}. Err :connection error: desc = "transport: authentication handshake failed: context deadline exceeded". Reconnecting...
Root Cause
These symptoms can be caused by the kube-apiserver being blocked by configuration that limits the number of files a process can have open. This limit could also affect other components and OS services.
This is typically a result of restrictive ulimits, or a high number of open connections.
Below is a non-exhaustive list of places where the number of open files ulimit can be set for a Docker container.
System ulimits (/etc/security/limits.conf):
This file defines the persisted configuration for the system-wide ulimits, such as file size limits, and how much memory can be used by the different components of the process, including the stack, data and text segments.
The limit of interest is the nofile
limit, which defines the number of files a process can have open at any given time. This can be set per user, or for all users( *
) and there are two limits to define:
- Soft limit - These limits are ones that the user can move up or down within the range permitted by any pre-existing hard limits. A user can modify the soft limit by running the command
ulimit -n X
where X is the desired new value. - Hard limit - These limits are set by the superuser and enforced by the Kernel. Users cannot exceed this.
The nofile
hard limit for the current user can be seen by running ulimit -Hn
and the soft limit can be seen by running ulimit -Sn
.
More info on limits.conf can be found here.
k3s and rke2 configuration
The k3s and rke2 install scripts both define LimitNOFILE=1048576 on the respective services. If you don't use the install scripts, you may need to configure ulimits as described below.
Systemd configuration
By design, systemd will ignore ulimits set via /etc/security/limits.conf
, and instead apply its own limits. These can be configured per-service or system-wide.
The system-wide systemd nofile limit is defined in /etc/systemd/system.conf
as DefaultLimitNOFILE=X:Y
. Where X is the soft limit and Y is the hard limit.
It is possible to set nofile
for a specific service, either by defining LimitNOFILE
within the service file itself or creating an override file. For example, defining it directly within the docker systemd service file (/lib/systemd/system/docker.service):
[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
BindsTo=containerd.service
After=network-online.target firewalld.service containerd.service
Wants=network-online.target
Requires=docker.socket
[Service]
Type=notify
ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
ExecReload=/bin/kill -s HUP $MAINPID
TimeoutSec=0
RestartSec=2
Restart=always
LimitNOFILE=infinity
Or creating a systemd override file (/etc/systemd/system/docker.d/override.conf):
[Service]
LimitNOFILE=infinity
Note: The
docker.d
directory name may be slightly different between Linux distributions. It is usually recommended to create an override, as this will persist through system updates.Note: On older versions of systemd,
LimitNOFILE=infinity
results in a limit of65535
. This is fixed as part of this commit which was merged in systemd v234. More info is available here.
Docker daemon configuration
It is possible to configure Docker to enforce its own open file limits on specific containers through the command line flags --default-ulimit nofile=X:Y
.
This can be applied to all containers by specifying the limit within the /etc/docker/daemon.json
configuration file:
{
"default-ulimits": {
"nofile": {
"Name": "nofile",
"Hard": 64000,
"Soft": 64000
}
Resolution
If you have any non-default configuration that is applying nofile restrictions on either docker, or containers, revert these to the default configuration, or increase the limits and re-test.
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.