Rancher pods in CrashLoopBackOff state with log messages "panic: indexer conflict: map[byPod:{}]" and "namespaces not found"
This document (000021302) is provided subject to the disclaimer at the end of this document.
Environment
Rancher v2.6.x or v2.7.x
A cluster Namespace in the Rancher local cluster that has been forcibly removed by removing the Kubernetes finalizer
Situation
The Rancher UI is not accessible and the Rancher pods are in a CrashLoopBackOff state due to a panic, with the log message "panic: indexer conflict: map[byPod:{}]".
Rancher pods logs contain resource and namespace not found errors related to the forcibly deleted cluster Namespace:
[ERROR] error syncing 'c-g4k52/m-c6n6g': handler node-controller: namespaces "c-g4k52 " not found, handler node-controller-sync: namespaces "c-g4k52 " not found
[ERROR] error syncing 'c-g4k52/m-6j6pk': handler node-controller: namespaces "c-g4k52 " not found, handler node-controller-sync: namespaces "c-g4k52 " not found,
[ERROR] error syncing 'c-g4k52/m-zx6qg': handler node-controller: namespaces "c-g4k52 " not found, handler node-controller-sync: namespaces "c-g4k52 " not found,
[ERROR] error syncing 'c-g4k52/creator-cluster-owner': handler mgmt-auth-crtb-controller: clusters.management.cattle.io "c-g4k52 " not found, requeuing
[ERROR] error syncing 'p-d6hvn/creator-project-owner': handler mgmt-auth-prtb-controller: clusters.management.cattle.io "c-g4k52 " not found, requeuing
[ERROR] error syncing 'p-ch66s/creator-project-owner': handler mgmt-auth-prtb-controller: clusters.management.cattle.io "c-g4k52 " not found, requeuing
[ERROR] error syncing 'c-g4k52/c-g4k52-fleet-default-owner': handler mgmt-auth-crtb-controller: failed to remove finalizer on controller.cattle.io/mgmt-auth-crtb-controller, requeuing
[ERROR] error syncing 'c-g4k52/c-g4k52-rl-f62qq': handler etcdbackup-controller: failed to remove finalizer on controller.cattle.io/etcdbackup-controller, requeuing
Resolution
Remove all of the orphaned resources in the now-deleted cluster Namespace by following the process documented at https://www.suse.com/support/kb/doc/?id=000020788
In this example, the problematic deleted cluster identified from the Rancher logs has the cluster ID and Namespace c-g4k52.
Cause
Orphaned resources within Namespaces for a now-deleted cluster exist as a result of the forced removal of the Kuberetes Namespace, due to the manual removal of the Kubernetes finalizer from the Namespace when it is stuck in a terminating state. The Kubernetes finalizer should never be removed from a Namespace and the following procedure can be used to remove a Namespace stuck in a terminating state:
https://www.suse.com/es-es/support/kb/doc/?id=000021065
Additional Information
- Remove namespace stuck in terminating state: https://www.suse.com/es-es/support/kb/doc/?id=000021065
- How to clean orphaned cluster objects: https://www.suse.com/support/kb/doc/?id=000020788
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.