Skip to content

System Upgrade pods fail on Windows nodes with 'service does not exist' error

This document (000021954) is provided subject to the disclaimer at the end of this document.

Environment

Rancher with managed rke2 downstream clusters that have Windows nodes.

Situation

Pods created by the system-upgrade-controller on Windows nodes to manage the rancher-wins service fail and enter an error state with the error:

could not build initial state for rancher-wins: could not open rancher-wins service while building initial state: rancher-wins service does not exist

The affected pods will be missing important securityContext.runAsUserName:

    securityContext:
      capabilities:
        add:
        - CAP_SYS_BOOT
      privileged: true

The correct securityContext for a system-upgrade pod for Windows is:

    securityContext:
      windowsOptions:
        hostProcess: true
        runAsUserName: NT AUTHORITY\SYSTEM

This missing context prevents the pod from accessing the Windows service information it needs, causing the failure.

Resolution

The issue is resolved by forcing Rancher to recreate the upgrade plan for the Windows nodes.

  • Delete the existing plan using kubectl:

kubectl -n cattle-system delete plan system-agent-upgrader-windows

Rancher will detect the missing plan and automatically regenerate it. The new plan will create pods with the correct securityContext, allowing the rancher-wins upgrade to proceed successfully.

Cause

The root cause for the plan being generated incorrectly is currently unknown. This issue has only been observed in a small number of environments and has not been successfully reproduced in testing.

Status

Top Issue

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.