How to shutdown a Kubernetes cluster (Rancher Kubernetes Engine (RKE) CLI provisioned or Rancher v2.x Custom clusters)
This document (000020031) is provided subject to the disclaimer at the end of this document.
Situation
Task
This article provides instructions for safely shutting down a Kubernetes cluster provisioned via the Rancher Kubernetes Engine (RKE) CLI or a Rancher v2.x provisioned Custom Cluster.
Requirements
- A Kubernetes cluster launched with the RKE CLI or from Rancher 2.x as a Custom Cluster
Background
If you have a need to shut down the infrastructure running a Kubernetes cluster (datacenter maintenance, migration, etc.) this guide will provide steps in the proper order to ensure a safe cluster shutdown. This guide has command examples for RKE-deployed clusters but the order of operations and the process is similar for most Kubernetes distributions.
Please ensure you complete an etcd backup before continuing this process. A guide regarding the backup and restore process can be found here.
Solution
N.B. If you have nodes that share worker, control plane, or etcd roles, postpone the
docker stop
and shutdown operations until worker or control plane containers have been stopped.
Draining nodes.
For all nodes, prior to stopping the containers, run:
kubectl get nodes
To identify the desired node, then run:
kubectl drain <node name>
This will safely evict any pods, and you can proceed with the following steps to a shutdown.
Shutting down the workers nodes
For each worker node:
- ssh into the worker node
- stop kubelet and kube-proxy by running
sudo docker stop kubelet kube-proxy
- stop docker by running
sudo service docker stop
orsudo systemctl stop docker
- shutdown the system
sudo shutdown now
Shutting down the control plane nodes
For each control plane node:
- ssh into the control plane node
- stop kubelet and kube-proxy by running
sudo docker stop kubelet kube-proxy
- stop kube-scheduler and kube-controller-manager by running
sudo docker stop kube-scheduler kube-controller-manager
- stop kube-apiserver by running
sudo docker stop kube-apiserver
- stop docker by running
sudo service docker stop
orsudo systemctl stop docker
- shutdown the system
sudo shutdown now
Shutting down the etcd nodes
For each etcd node:
- ssh into the etcd node
- stop kubelet and kube-proxy by running
sudo docker stop kubelet kube-proxy
- stop etcd by running
sudo docker stop etcd
- stop docker by running
sudo service docker stop
orsudo systemctl stop docker
- shutdown the system
sudo shutdown now
Shutting down storage
Shut down any persistent storage devices that you might have in your datacenter (such as NAS storage devices) if applicable. It iss important that you do this after shutting everything else down to prevent data loss/corruption for containers requiring persistency.
N.B. If you are running a cluster that was not deployed through RKE then the order of the process is still the same, however the commands may vary. For instance, some distributions run kubelet and other control plane items as a service on the node rather than in docker. Check documentation for the specific Kubernetes distribution for information as to how to stop these services.
Starting a Kubernetes cluster up after shutdown
Kubernetes is good about recovering from a cluster shutdown and requires little intervention, though there is a specific order in which things should be powered back on to minimize errors.
- Power on any storage devices if applicable.
Check with your storage vendor on how to properly power on you storage devices and verify that they are ready.
- For each etcd node:
- Power on the system/start the instance.
- Log into the system via ssh.
- Ensure docker has started
sudo service docker status
orsudo systemctl status docker
- Ensure etcd and kubelet’s status shows Up in Docker
sudo docker ps
- For each control plane node:
- Power on the system/start the instance.
- Log into the system via ssh.
- Ensure docker has started
sudo service docker status
orsudo systemctl status docker
- Ensure kube-apiserver, kube-scheduler, kube-controller-manager, and kubelet’s status shows Up in Docker
sudo docker ps
- For each worker node:
- Power on the system/start the instance.
- Log into the system via ssh.
- Ensure docker has started
sudo service docker status
orsudo systemctl status docker
- Ensure kubelet’s status shows Up in Docker
sudo docker ps
- Log into the Rancher UI (or use kubectl) and check your various projects to ensure workloads have started as expected. This may take a few minutes depending on the number of workloads and your server capacity.
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.