Skip to content

RKE2 certificate rotation failing on Windows worker nodes

This document (000021805) is provided subject to the disclaimer at the end of this document.

Environment

Rancher < v2.9.8 and Rancher v2.10 < v2.10.4

Situation

The cluster is stuck in an Updating state, with a status message of the format "waiting for [rke2-win-pnm94-pqhb5] certificate rotation", after the execution of Certification Rotation, waiting for the certificates to be rotated in Windows nodes.

The rancher-wins service (rancher-system-agent) on the affected Windows node(s) fails with the error: " {error executing instruction 0: exec:/bin/sh":  executable file not found in %PATH%}

PS C:\Windows> Get-EventLog -LogName Application -Source 'rancher-wins' -Newest 50 | format-table -Property TimeGenerated, ReplacementStrings -Wrap

TimeGenerated         ReplacementStrings
-------------         ------------------
4/25/2025 10:22:46 AM {[K8s] updated plan secret fleet-default/rke2-win-pnm94-pqhb5-machine-plan with feedback}
4/25/2025 10:22:46 AM  {error executing instruction 0: exec: "/bin/sh":  executable file not found in %PATH%}
4/25/2025 10:22:46 AM {[Applyinator] Running command: /bin/sh [-x /var/lib/rancher/capr/idempotence/idempotent.sh
                      certificate-rotation/restart-reset-failed
                      6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b
                      77bafa9e3a8a092afc4f1dd84e6e1cc58b68f42eb934bf0f6b73deb2518f78a3 /bin/sh /var/lib/rancher/capr
                      -c if [ $(systemctl is-failed rke2-agent) = failed ]; then systemctl reset-failed rke2-agent;
                      fi]}
4/25/2025 10:22:46 AM {[Applyinator] No image provided, creating empty working directory C:\var\lib\rancher\agent\work\
                      20250425-102246\e6cc55ca9f306482c21ae53e9090c85730d4509b6dd5aa7882538cbc703acb2b_0}
4/25/2025 10:22:46 AM {[Applyinator] Applying one-time instructions for plan with checksum
                      e6cc55ca9f306482c21ae53e9090c85730d4509b6dd5aa7882538cbc703acb2b}
4/25/2025 10:22:16 AM {[K8s] updated plan secret fleet-default/rke2-win-pnm94-pqhb5-machine-plan with feedback}
4/25/2025 10:22:16 AM {error executing instruction 0: exec: "/bin/sh": executable file not found in %PATH%}
4/25/2025 10:22:16 AM {[Applyinator] Running command: /bin/sh [-x /var/lib/rancher/capr/idempotence/idempotent.sh
                      certificate-rotation/restart-reset-failed
                      6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b
                      77bafa9e3a8a092afc4f1dd84e6e1cc58b68f42eb934bf0f6b73deb2518f78a3 /bin/sh /var/lib/rancher/capr
                      -c if [ $(systemctl is-failed rke2-agent) = failed ]; then systemctl reset-failed rke2-agent;

Requirements:

  • Powershell access to Windows worker nodes
  • .NET installed in Windows worker nodes.

Resolution

The issue occurs in affected versions due to the attempt to invoke a script via the sh binary (/bin/sh) on Windows worker nodes, that is not present and can only be invoked on Linux nodes.

The following steps are intended to mitigate the issue, by creating an empty sh binary in the required location on Windows nodes. The binary created does not rotate the Windows certificates and this is a temporary workaround. We recommend upgrading Rancher to Rancher v2.9.8+, v2.10.4+, or 2.11+ to fix the issue permanently.

Create an sh.exe binary with an empty function, that will return without performing any action, in all the Windows worker nodes within the cluster. The script must be created in the C:/bin folder:

  1. Create a folder "bin" in C: and change directory to the newly created folder to execute step 2.
mkdir C:/bin
cd C:/bin
  1. Create the C# source file with the name rancher-windowsnode-sh-exe.cs and the contents below. This file will be complied in step 3.
class Program

{

static void Main(string[] args){}

}

PS C:\bin> type .\rancher-windowsnode-sh-exe.cs
class Program

{

static void Main(string[] args){}

}
PS C:\bin>

  1. Run the command below to compile the binary from the source file:
    c:\Windows\microsoft.net\Framework64\v4.0.30319\csc.exe /target:exe /out:c:\bin\sh.exe rancher-windowsnode-sh-exe.cs

The output should be similar to:

PS C:\bin> c:\Windows\microsoft.net\Framework64\v4.0.30319\csc.exe /target:exe /out:c:\bin\sh.exe rancher-windowsnode-sh-exe.cs
Microsoft (R) Visual C# Compiler version 4.8.4161.0
for C# 5
Copyright (C) Microsoft Corporation. All rights reserved.

This compiler is provided as part of the Microsoft (R) .NET Framework, but only supports language versions up to C# 5, which is no longer the latest version. For compilers that support newer versions of the C# programming language, see http://go.microsoft.com/fwlink/?LinkID=533240
  1. You can now confirm the presence of the sh.exe binary within the C:/bin folder:
PS C:\bin> dir


       Directory: C:\bin


Mode                 LastWriteTime         Length Name
   ----                 -------------         ------ ----
   -a----         4/25/2025   1:57 PM             60 rancher-windowsnode-sh-exe.cs
   -a----         4/25/2025   2:02 PM           3584 sh.exe

If the cluster remains in an Updating status, verify if the cluster status is paused and proceed to unpause by following the Resolution steps described at https://www.suse.com/support/kb/doc/?id=000021399

Additional Information

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.