How to fix RKE2 Error: etcdserver mvcc database space exceeded?
Article Number: 000021764
Environment
SUSE Rancher
RKE2
Situation
Overview:
In Kubernetes, etcd is the authoritative data store for cluster state, configuration, and other critical metadata.
Problem:
As workloads grow and more objects are created/updated, the etcd database can hit its default size quota. In RKE2, server nodes run an embedded etcd; when its BoltDB backend reaches the quota, etcd rejects new writes with:
etcdserver: mvcc: database space exceeded
Impact:
- Kubernetes API operations (create/update) begin to fail
- Controllers stall and reconcile loops pause
- The API server may log errors or even panic referencing the message above
Symptoms you'll see:
- Pod events / apiserver logs:
etcdserver: mvcc: database space exceeded
- etcd alarms include
NOSPACE
- Cluster becomes read-only (creates/updates fail; reads still OK)
- High etcd DB size on disk; frequent WAL/bolt compaction messages
Cause
Why this happens (root cause):
-
etcd’s backend quota (default ~2 GiB) is reached because:
-
Lots of historical revisions accumulated (high churn: ConfigMaps/Secrets/CRDs updated frequently).
- Regular compaction/defragmentation hasn’t reclaimed space.
- Very large objects (big Secrets/ConfigMaps) are stored.
- Once quota is hit, etcd sets a NOSPACE alarm and rejects writes until you compact + defrag.
The etcd database size is primarily influenced by the number of objects stored, frequent updates, and high write activity. If the database size limit is reached, you may experience issues such as slow API responses, failed leader elections, or cluster instability. Additionally, excessive fragmentation due to outdated entries and stale snapshots can contribute to rapid storage consumption.
Resolution
Part 1: etcd database compact and defrag
Start by compacting and defragmenting the etcd database to resolve the issue. Follow the steps below:
Prework (Important)
export CRI_CONFIG_FILE=/var/lib/rancher/rke2/agent/etc/crictl.yaml
PATH="$PATH:/var/lib/rancher/rke2/bin"
etcdcontainer=$(/var/lib/rancher/rke2/bin/crictl ps --label io.kubernetes.container.name=etcd --quiet)
ETCD_CERT=/var/lib/rancher/rke2/server/tls/etcd/server-client.crt
ETCD_KEY=/var/lib/rancher/rke2/server/tls/etcd/server-client.key
ETCD_CACERT=/var/lib/rancher/rke2/server/tls/etcd/server-ca.crt
ETCDCTL_ENDPOINTS=$(crictl exec ${etcdcontainer} etcdctl --cert ${ETCD_CERT} --key ${ETCD_KEY} --cacert ${ETCD_CACERT} member list | cut -d, -f5 | sed -e 's/ //g' | paste -sd ',')
- etcdctl compact (Note:
"etcdctl compact"
is a blocking operation—each member pauses writes during compaction; acceptable in read-only incidents, but be aware that it will block writes for multiple seconds on larger clusters and plan accordingly)
Command:
$ rev=$(crictl exec ${etcdcontainer} etcdctl --cert ${ETCD_CERT} --key ${ETCD_KEY} --cacert ${ETCD_CACERT} --endpoints=$ETCDCTL_ENDPOINTS endpoint status --write-out fields | grep Revision | cut -d: -f2)
$ crictl exec ${etcdcontainer} etcdctl --cert ${ETCD_CERT} --key ${ETCD_KEY} --cacert ${ETCD_CACERT} --endpoints=$ETCDCTL_ENDPOINTS compact $rev
Example output:
compacted revision 31014066
- etcdctl defrag
Command:
$ crictl exec ${etcdcontainer} etcdctl --cert ${ETCD_CERT} --key ${ETCD_KEY} --cacert ${ETCD_CACERT} --endpoints=$ETCDCTL_ENDPOINTS defrag --cluster
Example output:
Finished defragmenting etcd member[https://10.55.2.123:2379]
- etcdctl alarm list
Command:
$ crictl exec ${etcdcontainer} etcdctl --cert ${ETCD_CERT} --key ${ETCD_KEY} --cacert ${ETCD_CACERT} --endpoints=$ETCDCTL_ENDPOINTS alarm list
You can run the etcd member list/endpoint status/endpoint health commands to confirm the database size has decreased and that the cluster has recovered. For more details, refer to the Additional Information section below.
Part 2: Increase quota-backend-bytes to extend the etcd keyspace limit in RKE2
Sometimes the cluster won’t allow compact
or defrag
: all attempts fail because etcd has raised a NOSPACE alarm with no free headroom. The alarm blocks writes, and defrag
still requires some writable space. Temporarily free disk space (e.g., remove old snapshots/logs or expand the volume) or increase quota-backend-bytes
to create headroom, then run compact
followed by defrag
.
For a standalone RKE2 cluster, you can increase the etcd database size by modifying the RKE2 configuration file /etc/rancher/rke2/config.yaml. (This example sets the value to 8,589,934,592 (8 GiB); you can choose a different size as needed.)
etcd-arg:
- "quota-backend-bytes=8589934592"
and restart rke2-server
systemctl restart rke2-server
For a cluster managed by Rancher, go to Rancher, Cluster Management -> select the cluster and edit the YAML ( which you will get from the 3 dots).
Add quota-backend-bytes under machineSelectorConfig.config.etcd-arg. For example:
machineSelectorConfig:
- config:
etcd-arg: quota-backend-bytes=8589934592
matchLabels:
rke.cattle.io/etcd-role: 'true'
Increasing the etcd database size can help mitigate etcd storage consumption issues and ensure the cluster remains stable and responsive.