System Upgrade pods fail on Windows nodes with 'service does not exist' error
Article Number: 000021954
Environment
Rancher with managed rke2 downstream clusters that have Windows nodes.
Situation
Pods created by the system-upgrade-controller
on Windows nodes to manage the rancher-wins
service fail and enter an error state with the error:
could not build initial state for rancher-wins: could not open rancher-wins service while building initial state: rancher-wins service does not exist
The affected pods will be missing important securityContext.runAsUserName:
securityContext:
capabilities:
add:
- CAP_SYS_BOOT
privileged: true
The correct securityContext for a system-upgrade pod for Windows is:
securityContext:
windowsOptions:
hostProcess: true
runAsUserName: NT AUTHORITY\SYSTEM
This missing context prevents the pod from accessing the Windows service information it needs, causing the failure.
Cause
The root cause for the plan being generated incorrectly is currently unknown. This issue has only been observed in a small number of environments and has not been successfully reproduced in testing.
Resolution
The issue is resolved by forcing Rancher to recreate the upgrade plan for the Windows nodes.
- Delete the existing plan using
kubectl
:
kubectl -n cattle-system delete plan system-agent-upgrader-windows
Rancher will detect the missing plan and automatically regenerate it. The new plan will create pods with the correct securityContext
, allowing the rancher-wins
upgrade to proceed successfully.