How to perform graceful Node Shutdown in RKE2
Article Number: 000022104
Environment
- RKE2
Procedure
To perform a graceful node shutdown in RKE2 for maintenance scenarios, follow these steps to cordon and drain the node:
- Mark the node as unschedulable using the command:
kubectl cordon <node name>
kubectl drain <node name> --ignore-daemonsets --force
On worker nodes
1. Stop the rke2-agent service:
sudo systemctl stop rke2-agent
2. Check for any remaining container processes that should be stopped:
sudo ps auxfww
On control pane / etcd nodes
1. Stop the rke2-server service:
sudo systemctl stop rke2-server
2. Check for any remaining container processes that should be stopped:
sudo ps auxfww
Stop remaining processes
If all application workloads have been stopped, this is not needed before shutting down the node, however, in some cases, it may be useful to stop all remaining container processes and components like containerd.
- Verify that no application workloads are running on the node
kubectl describe node <node name>
sudo /usr/local/bin/rke2-killall.sh
Note: The
rke2-killall.shscript uses SIGKILL to terminate processes, which may negatively impact stateful application workloads that may still be running. For stateful workloads, consider a solution that sends SIGTERM with a timeout before resorting to SIGKILL. For related information, refer to the documentation: Best practices for RKE2 cluster maintenance. Always, make sure to capture an etcd snapshot before performing any node maintenance activity.
Start the service again
- After maintenance, start the service:
-
- Worker (agent) nodes: sudo systemctl start rke2-agent
- Control plane (server) nodes: sudo systemctl start rke2-server
- Mark the node as schedulable again:
kubectl uncordon <node name>