API Extension Service Failure Due to Custom Helm Release Name

Article Number: 000022082

Environment

• Rancher v2.11.x

• Rancher deployed using a custom Helm release name other than the default (rancher) (e.g., rancher-stable).

Situation

Rancher v2.11 ships with a new API extension, v1.ext.cattle.io, which is required for internal Rancher cluster management components, such as the capi-controller-manager.
When Rancher is installed using a custom Helm release name, services relying on resources within this new API group fail to discover the API extension correctly, leading to functional disruption.
Engine logs for affected components may display synchronization errors indicating API discovery failure:

failed to sync schemas: unable to retrieve the complete list of server APIs: [ext.cattle.io/v1]: stale GroupVersion discovery: [ext.cattle.io/v1]

While standard Rancher Helm charts are designed to dynamically template labels and resource names based on the chosen release name, this specific extension service does not adhere to dynamic naming conventions.

Additionally, this backing service may persist in the `cattle-system` namespace if a rollback from Rancher 2.11 to a previous version is attempted.

Cause

The root cause is a software bug related to the hardcoded label selector configured for the API extension's backing service.

1. API Extension Service: The API extension is backed by the service `cattle-system/imperative-api-extension`.
2. Hardcoded Selector: This service contains a hardcoded label selector of `app: rancher`.
3. Label Mismatch: When a custom Helm release name (e.g., `rancher-stable`) is used, the Rancher deployment pods correctly receive an `app` label matching that name (e.g., `app: rancher-stable`).
4. Service Failure: Due to the label mismatch, the `imperative-api-extension` service cannot select the intended Rancher pods [User Query]. This prevents the proper registration of the `v1.ext.cattle.io` `APIService` via the Kubernetes API Aggregation Layer, resulting in the reported stale GroupVersion discovery errors.

This situation confirms a bug in the new `imperative-api-extension` feature, as it expects the static `app: rancher` label instead of dynamically resolving the name based on the Helm release. This issue is being tracked internally for a long-term resolution.

Changing Release Name Risk:

If you consider renaming your existing Helm chart release (e.g., from `rancher-stable` back to `rancher`), be advised that this process is highly disruptive: it forces a complete removal of the previous Rancher deployment and the creation of a fresh instance.

This action will cause the Rancher management plane UI/API to be unavailable during the entire process, although downstream cluster workloads are designed to remain functional.

Resolution

### Workaround: Manually Patch the Service Selector

The recommended immediate resolution is to manually edit the `imperative-api-extension` service selector to match the actual application label applied to the Rancher deployment pods.

1. Identify the Correct Application Label

Determine the actual application label used by your Rancher pods (this is usually your Helm release name, e.g., `rancher-stable`):

# Replace <RANCHER_RELEASE_NAME> with your actual Helm release name
kubectl get deployment <RANCHER_RELEASE_NAME> -n cattle-system -o yaml | grep 'app:'

2. Patch the Imperative API Extension Service

Patch the Service in the `cattle-system` namespace, replacing `<ACTUAL_APP_LABEL>` with the value found in the previous step:

kubectl patch svc imperative-api-extension -n cattle-system \
-p '{"spec":{"selector":{"app":"<ACTUAL_APP_LABEL>"}}}'

### Cleanup after Rollback (If applicable)

If this issue occurs after rolling back from Rancher v2.11 to a prior version, the `imperative-api-extension` service may persist.
If the service is orphaned and not required by the older Rancher version, you should manually delete it:

kubectl delete svc imperative-api-extension -n cattle-system

Deleting the service is a specific cleanup step, contrasting with the general Rancher rollback process which usually involves restoring state using the Rancher backup operator.
General best practices recommend taking backups (snapshots) before any major operation like an upgrade or rollback.