Skip to content

Pods stuck in terminating state due to PodDisruptionBudget

This document (000021824) is provided subject to the disclaimer at the end of this document.

Environment

Rancher Prime: v2.9.2.

RKE2: v1.28.11+rke2r1

Situation

With PodDisruptionBudget configured, restarting the workload deployments or draining/shutting down the node puts the pods in Terminating state.

Resolution

The permanent fix would be to upgrade to v1.30.9+rke2r1.

Temporary fix would be to disable PodDisruptionBudget or adjust the minAvailable/ maxUnavailable values in PDB to get it working without disabling PDBs.

Cause

When a PodDisruptionBudget (PDB) is configured, restarting workload deployments or draining/shutting down a node may result in pods entering a Terminating state without completing cleanup.

We observed this behavior on Kubernetes v1.28.11+rke2r1, particularly under the following edge case:

  • Assume a cluster with two worker nodes.
  • A deployment has three pod replicas, and the PDB is configured with minAvailable: 50%.
  • Suppose one replica is scheduled on one node and the remaining two replicas on the second node.
  • If the node hosting the two replicas is drained or rebooted, multiple pods may become unavailable simultaneously.

Given the PDB setting of minAvailable: 50% and a total of three replicas, only one pod can be voluntarily disrupted at any time. Draining the node with two replicas causes more than one pod to be unavailable, thereby violating the PDB policy. As a result, the remaining pods may be stuck in the Terminating state and not clean up properly.

This behaviour is not reflected in the pod or deployment logs. The only observable clue is found in the kube-controller-manager logs, where the following error message typically appears:

kube-system-kube-controller-manager-k00m01.nve90.rpt.idia:E0131 10:37:43.721229 1 disruption.go:626] Error syncing PodDisruptionBudget <namespace>/<deployment>, requeuing: Operation cannot be fulfilled on poddisruptionbudgets.policy "<deployment>": the object has been modified; please apply your changes to the latest version and try again

Additional Information

https://kubernetes.io/docs/tasks/run-application/configure-pdb/#think-about-how-your-application-reacts-to-disruptions

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.