Skip to content

Pods return to the Initializing status even though containers in the pod are running

Article Number: 000020002

Environment

RKE1 clusters

Situation

After successfully starting pods return to the 'Initializing status'.

When reviewed an init container is failing to start, even though other containers are running.

If one of the running containers fails it will not be restarted due to the failed init container.

Cause

This happens when the init container is removed from the host machine.

Because init containers have run to completion, and are terminated, this can happen when a docker system prune or docker container prune is run on the node.

When the kubelet sees that the init container in no longer exists it will try and rerun it. Depending on the init container operation this may fail on a pod that is already running (e.g linkerd).

Resolution

This often happens after images has beeen pruned on the node.
Manually pruning images on Kubernetes nodes should be avoided.
The kubelet has a built in image cleanup mechanism to remove unused containers and images.

Where it's not possible to avoid manual clean up, init containers that are stopped should not be removed.
A list of init container IDs can be generated with the following command:

kubectl get pods --all-namespaces -o jsonpath='{range .items[*].status.initContainerStatuses[*]}{.containerID}{"\n"}{end}' | cut -d/ -f3

The below script can be used to generate a list of containers to clean on a remote node, e.g.

NODE_TO_CLEAN=<node_ip>
USER=<user>

INIT_CONTAINERS=$(kubectl get pods --all-namespaces -o jsonpath='{range .items[*].status.initContainerStatuses[*]}{.containerID}{"\n"}{end}' | cut -d/ -f3)
TERMED_PODS=$(ssh -o LogLevel=QUIET -t ${USER}@${NODE_TO_CLEAN} sudo docker ps -qa --filter status=exited --no-trunc | sed -e 's/\r//g')

CONTAINERS_TO_REMOVE=$(comm -23 <(echo $TERMED_PODS | sort) <(echo $INIT_CONTAINERS | sort) )
PASS_CONTAINERS=$(typeset -p CONTAINERS_TO_REMOVE)

ssh -o LogLevel=QUIET -t ${USER}@${NODE_TO_CLEAN} bash <<EOF
    $PASS_CONTAINERS
    sudo docker rm $(echo "\${CONTAINERS_TO_REMOVE}") && sudo docker image prune -af
EOF