Longhorn Instance Manager Pod Fails with exec /tini: exec format error on RKE2
Article Number: 000022343
Environment
SUSE Longhorn
Situation
In a Longhorn deployment, two nodes are experiencing a crash loop with the instance-manager pod, preventing Longhorn from managing volumes on those nodes.
When checking the logs of the failing instance-manager container, the following error is observed:
exec /tini: exec format error
This indicates that the container is unable to execute the /tini binary during startup.
Cause
The correct multi-architecture Longhorn image is being used, and the tini binary is present in the image. However, the error suggests an image layer corruption issue on the node.
Even though the image is valid upstream, the local copy stored in the node’s containerd cache is corrupted or incomplete, causing the exec format error when the container attempts to start.
You can verify that the binary exists in the image by running:
/var/lib/rancher/rke2/bin/ctr \
--address /run/k3s/containerd/containerd.sock \
--namespace k8s.io \
run --rm <image> \
<pod_name> sh
Then inside the container run '/tini' to validate if its present inside the image.
Resolution
On the affected node(s), fully purge the corrupted instance-manager image layers from containerd and allow them to be re-pulled.
Step 1: List instance-manager content blobs
/var/lib/rancher/rke2/bin/ctr \
--address /run/k3s/containerd/containerd.sock \
--namespace k8s.io \
content ls | grep instance-manager | awk '{print $1}'
Step 2: Delete the content blobs we got from previous command
/var/lib/rancher/rke2/bin/ctr \
--address /run/k3s/containerd/containerd.sock \
--namespace k8s.io \
content ls | grep instance-manager | awk '{print $1}' | \
xargs /var/lib/rancher/rke2/bin/ctr \
--address /run/k3s/containerd/containerd.sock \
--namespace k8s.io \
content del
Step 3: Re-pull the image manually (if not auto-pulled)
/var/lib/rancher/rke2/bin/ctr \
--address /run/k3s/containerd/containerd.sock \
--namespace k8s.io \
images pull <image>
Once the image is re-pulled, the instance-manager pod should start normally and stop cycling.