RKE2 server fails to start due to etcd metrics port conflict
This document (000021884) is provided subject to the disclaimer at the end of this document.
Environment
RKE2 - v1.32.5+rke2r1
Situation
In an RKE2 setup, the rke2-server
service intermittently fails to start with the following error message in the journal logs:
jun 20 09:04:13 <node> rke2[2524118]: time="2025-06-20T09:04:13+02:00" level=fatal msg="Failed to reconcile with temporary etcd: listen tcp 127.0.0.1:2381: bind: address already in use"
jun 20 09:04:13 <node> systemd[1]: rke2-server.service: Main process exited, code=exited, status=1/FAILURE.
The config.yaml
includes the following custom etcd configuration:
etcd-arg:
- --listen-metrics-urls=http://127.0.0.1:2381,http://<nodeip>:2381
etcd-expose-metrics: true
Resolution
To avoid the port binding conflict, remove the manual override of the --listen-metrics-urls
parameter. Instead, rely on the below `` setting alone:
etcd-expose-metrics: true
This instructs RKE2 to expose metrics in a compatible manner without causing a conflict between temporary and cluster etcd instances.
Cause
Overriding the --listen-metrics-urls
argument with 127.0.0.1:2381
and <nodeip>:2381
causes both the temporary and cluster etcd instances to attempt to bind to the same local port ( 127.0.0.1:2381
). This results in a port conflict, preventing the rke2-server
from starting.
Additional Information
Reference: Github issue #4479
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.