Pod with encrypted Longhorn PVC fails to start with volume mount error "in use by the system; will not make a filesystem here"

Article Number: 000022061

Environment

SUSE Rancher Longhorn v1.6.2, v1.7.0
K3s cluster

Situation

A workload pod using an encrypted Longhorn volume fails to start because the volume cannot be mounted. This issue occurs in a Rancher K3s cluster, particularly after the k3s-agent service is restarted on a node. This article explains a known bug related to encrypted volumes in K3s clusters and provides a solution to resolve the problem.

Cause

This is a known issue with encrypted Longhorn volumes on K3S cluster, as explained in the GitHub issue.
When the k3s-agent service on a worker node is restarted, or if a worker node becomes temporarily unreachable, Kubernetes will attempt to move the workload to another scheduled worker node. If the worker node becomes available again, the workload might not be moved, and the volume will be reattached.
In this scenario, Longhorn does not properly close the encrypted volume using the cryptsetup commands, resulting in the encrypted volume being considered open, and the volume has been reattached.

Resolution

There could be a situation while using an encrypted volume for a pod where the pod fails to start, indicating volume mount failure. The following error is seen in longhorn-csi-plugin logs:

2025-08-18T11:17:22.978803544+10:00 time="2025-08-18T01:17:22Z" level=error msg="NodeStageVolume: err: rpc error: code = Internal desc = format of disk \"/dev/mapper/pvc-xxxxx-xxxx-xxxx-x
xxx-xxxxxxxx\" failed: type:(\"ext4\") target:(\"/var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/1f3db0f37bxxxxxbd43c85397bd005d208xxxxx1f413ff5294a9635b9e3/globalmoun
t\") options:(\"defaults\") errcode:(exit status 1) output:(mke2fs 1.46.4 (18-Aug-2021)\n/dev/mapper/pvc-xxxxx-xxxx-xxxx-x
xxx-xxxxxxxx is apparently in use by the system; will not mak
e a filesystem here!\n) " func=csi.logGRPC file="server.go:138"

Messages on the node also show that the disk format also failed:

Aug 13 13:11:42 example-node1 k3s[417005]: E0813 13:11:42.819061  417005 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/csi/driver.longhorn.io^pvc-f1b45c27-e434-40cf
-9a5d-874f452db250 podName: nodeName:}" failed. No retries permitted until 2025-08-13 13:13:44.819036862 +1000 AEST m=+577.297768151 (durationBeforeRetry 2m2s). Error: MountVolume.MountDevic
e failed for volume "pvc-xxxxx-xxxx-xxxx-xxxx-xxxxxxxx" (UniqueName: "kubernetes.io/csi/driver.longhorn.io^pvc-xxxxx-xxxx-xxxx-xxxx-xxxxxxxx") pod "test-xxxx-xxxx" (
UID: "xxxxxx-xxxxx-xxxxx-xxxx-xxxxxxxx") : rpc error: code = Internal desc = format of disk "/dev/mapper/pvc-xxxxx-xxxx-xxxx-xxxx-xxxxxxxx" failed: type:("ext4") target:("/var/lib
/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/1f3db0f37bxxxxxbd43c85397bd005d208xxxxx1f413ff5294a9635b9e3/globalmount") options:("defaults") errcode:(exit status 1) output:(mke2
fs 1.46.4 (18-Aug-2021)
Aug 13 13:11:42 example-node1 k3s[417005]: Warning: could not erase sector 2: Input/output error
Aug 13 13:11:42 example-node1 k3s[417005]: Creating filesystem with 10481664 4k blocks and 2621440 inodes
Aug 13 13:11:42 example-node1 k3s[417005]: Filesystem UUID: xxxx-xxxx-xxxxx-xxxx-xxxxxxxx
Aug 13 13:11:42 example-node1 k3s[417005]: Superblock backups stored on blocks:
Aug 13 13:11:42 example-node1 k3s[417005]: #01132768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
Aug 13 13:11:42 example-node1 k3s[417005]: #0114096000, 7962624
Aug 13 13:11:42 example-node1 k3s[417005]: Allocating group tables:   0/320#010#010#010#010#010#010#010       #010#010#010#010#010#010#010done
Aug 13 13:11:42 example-node1 k3s[417005]: Warning: could not read block 0: Input/output error
Aug 13 13:11:42 example-node1 k3s[417005]: Warning: could not erase sector 0: Input/output error
Aug 13 13:11:42 example-node1 k3s[417005]: Writing inode tables:   0/320#010#010#010#010#010#010#010       #010#010#010#010#010#010#010done
Aug 13 13:11:42 example-node1 k3s[417005]: Creating journal (65536 blocks): done
Aug 13 13:11:42 example-node1 k3s[417005]: Writing superblocks and filesystem accounting information:   0/320#010#010#010#010#010#010#010mkfs.ext4: Input/output error while writing out and cl
osing file system

Resolution:

Run commands like "lsof" or “ps aux” to confirm any process holding the volume.
If the above commands do not show any outputs, then run the command below to workaround the issue, pointing to the affected volume's full path to understand the processes holding the volume:

# fuser -v /dev/mapper/pvc-xxxxx-xxxx-xxxx-xxxx-xxxxxxxx

Note the PID of the process from the above command. Then kill the process using the command below:

# fuser -k /dev/mapper/pvc-xxxxx-xxxx-xxxx-xxxx-xxxxxxxx

Since the volume is encrypted, deactivate the opened LUKS/dm-crypt device:

# cryptsetup close pvc-xxxxx-xxxx-xxxx-xxxx-xxxxxxxx

Verify that no processes are using the volume and it has also been deactivated from the encryption layer:

# fuser -v /dev/mapper/pvc-xxxxx-xxxx-xxxx-xxxx-xxxxxxxx
# cryptsetup status pvc-xxxxx-xxxx-xxxx-xxxx-xxxxxxxx

Scale up the workload and confirm the volume mounts successfully.

Since this is a known bug which is fixed in Longhorn version v1.8.0, it is recommended to upgrade Longhorn version to v1.8.0+ for a persistent fix.