Skip to content

How to fix RKE2 Error: etcdserver mvcc database space exceeded?

Article Number: 000021764

Environment

SUSE Rancher

RKE2

Situation

Overview:
In Kubernetes, etcd is the authoritative data store for cluster state, configuration, and other critical metadata.

Problem:
As workloads grow and more objects are created/updated, the etcd database can hit its default size quota. In RKE2, server nodes run an embedded etcd; when its BoltDB backend reaches the quota, etcd rejects new writes with:

etcdserver: mvcc: database space exceeded

Impact:

  • Kubernetes API operations (create/update) begin to fail
  • Controllers stall and reconcile loops pause
  • The API server may log errors or even panic referencing the message above

Symptoms you'll see:

  • Pod events / apiserver logs: etcdserver: mvcc: database space exceeded
  • etcd alarms include NOSPACE
  • Cluster becomes read-only (creates/updates fail; reads still OK)
  • High etcd DB size on disk; frequent WAL/bolt compaction messages

Cause

Why this happens (root cause):

  • etcd’s backend quota (default ~2 GiB) is reached because:

  • Lots of historical revisions accumulated (high churn: ConfigMaps/Secrets/CRDs updated frequently).

  • Regular compaction/defragmentation hasn’t reclaimed space.
  • Very large objects (big Secrets/ConfigMaps) are stored.
  • Once quota is hit, etcd sets a NOSPACE alarm and rejects writes until you compact + defrag.

The etcd database size is primarily influenced by the number of objects stored, frequent updates, and high write activity. If the database size limit is reached, you may experience issues such as slow API responses, failed leader elections, or cluster instability. Additionally, excessive fragmentation due to outdated entries and stale snapshots can contribute to rapid storage consumption.

Resolution

Part 1: etcd database compact and defrag

Start by compacting and defragmenting the etcd database to resolve the issue. Follow the steps below:

Prework (Important)

export CRI_CONFIG_FILE=/var/lib/rancher/rke2/agent/etc/crictl.yaml
PATH="$PATH:/var/lib/rancher/rke2/bin"

etcdcontainer=$(/var/lib/rancher/rke2/bin/crictl ps --label io.kubernetes.container.name=etcd --quiet)
ETCD_CERT=/var/lib/rancher/rke2/server/tls/etcd/server-client.crt
ETCD_KEY=/var/lib/rancher/rke2/server/tls/etcd/server-client.key
ETCD_CACERT=/var/lib/rancher/rke2/server/tls/etcd/server-ca.crt
ETCDCTL_ENDPOINTS=$(crictl exec ${etcdcontainer} etcdctl --cert ${ETCD_CERT} --key ${ETCD_KEY} --cacert ${ETCD_CACERT} member list | cut -d, -f5 | sed -e 's/ //g' | paste -sd ',')
  • etcdctl compact (Note:"etcdctl compact" is a blocking operation—each member pauses writes during compaction; acceptable in read-only incidents, but be aware that it will block writes for multiple seconds on larger clusters and plan accordingly)
Command:
$ rev=$(crictl exec ${etcdcontainer} etcdctl --cert ${ETCD_CERT} --key ${ETCD_KEY} --cacert ${ETCD_CACERT} --endpoints=$ETCDCTL_ENDPOINTS endpoint status --write-out fields | grep Revision | cut -d: -f2)

$ crictl exec ${etcdcontainer} etcdctl --cert ${ETCD_CERT} --key ${ETCD_KEY} --cacert ${ETCD_CACERT} --endpoints=$ETCDCTL_ENDPOINTS compact $rev

Example output:
compacted revision 31014066
  • etcdctl defrag
Command: 
$ crictl exec ${etcdcontainer} etcdctl --cert ${ETCD_CERT} --key ${ETCD_KEY} --cacert ${ETCD_CACERT} --endpoints=$ETCDCTL_ENDPOINTS defrag --cluster

Example output:
Finished defragmenting etcd member[https://10.55.2.123:2379]
  • etcdctl alarm list
Command:
$ crictl exec ${etcdcontainer} etcdctl --cert ${ETCD_CERT} --key ${ETCD_KEY} --cacert ${ETCD_CACERT} --endpoints=$ETCDCTL_ENDPOINTS alarm list

You can run the etcd member list/endpoint status/endpoint health commands to confirm the database size has decreased and that the cluster has recovered. For more details, refer to the Additional Information section below.

Part 2: Increase quota-backend-bytes to extend the etcd keyspace limit in RKE2

Sometimes the cluster won’t allow compact or defrag: all attempts fail because etcd has raised a NOSPACE alarm with no free headroom. The alarm blocks writes, and defrag still requires some writable space. Temporarily free disk space (e.g., remove old snapshots/logs or expand the volume) or increase quota-backend-bytes to create headroom, then run compact followed by defrag .

For a standalone RKE2 cluster, you can increase the etcd database size by modifying the RKE2 configuration file /etc/rancher/rke2/config.yaml. (This example sets the value to 8,589,934,592 (8 GiB); you can choose a different size as needed.)


etcd-arg:
  - "quota-backend-bytes=8589934592"

and restart rke2-server

systemctl restart rke2-server

For a cluster managed by Rancher, go to Rancher, Cluster Management -> select the cluster and edit the YAML ( which you will get from the 3 dots). 

Add quota-backend-bytes under machineSelectorConfig.config.etcd-arg.  For example:

 machineSelectorConfig:
      - config:
          etcd-arg: quota-backend-bytes=8589934592
        matchLabels:
          rke.cattle.io/etcd-role: 'true'

Increasing the etcd database size can help mitigate etcd storage consumption issues and ensure the cluster remains stable and responsive.