How to Fix Kubernetes Multi-Attach Error: Volume Already Exclusively Attached to Another Node
Threat/Impact Level: HIGH | Exploitability/Downtime Risk: HIGH | Time to Fix: 10–20 mins
TL;DR
- What broke: A
ReadWriteOncePV is locked to a node that is dead, cordoned, or slow to release, so the rescheduled pod can't mount it and stays inContainerCreatingindefinitely. - How to fix it: Force-delete the stuck
VolumeAttachmentobject and, if needed, patch the PV'sclaimRefto release it from the ghost node. - Fast path: Use our Client-Side Sandbox below to auto-refactor your StorageClass and PVC YAML — paste your config, get back the corrected manifests without sending your cluster details to any external server.
The Incident (What Does the Error Mean?)
Raw event from kubectl describe pod <pod-name>:
Warning FailedAttachVolume 3m attachdetach-controller
Multi-Attach error for volume "pvc-a1b2c3d4-e5f6-7890-abcd-ef1234567890"
Volume is already exclusively attached to one node and can't be attached to another
Also visible in kubectl get events -n <namespace>:
FailedMount Unable to attach or mount volumes: unmounted volumes=[data],
unattached volumes=[data]: timed out waiting for the condition
Immediate consequence: The pod is stuck in ContainerCreating. The underlying EBS volume (AWS), GCE PD, or Azure Disk — all ReadWriteOnce block devices — has an in-use lock registered in the Kubernetes VolumeAttachment API object pointing at the old node. The new node cannot acquire the lock. Your workload is down.
The Attack Vector / Blast Radius
This is a cascading node failure scenario, not a misconfiguration you can ignore until the next sprint.
- Node goes NotReady (spot interruption, OOM kill, kernel panic, network partition).
- Kubernetes waits for the default
node.kubernetes.io/unreachabletaint toleration timeout — 300 seconds by default — before evicting pods. - The cloud provider's volume detach call either never fires or hangs because the node's kubelet is dead and can't unmount cleanly.
- The
attachdetach-controllersees theVolumeAttachmentobject still exists and refuses to create a new one for the target node. It will not force-detach by default. - Every pod in the Deployment/StatefulSet that needs this PVC is now unschedulable. If this is a StatefulSet with
podManagementPolicy: OrderedReady, the entire StatefulSet rolls to a halt — not just one pod. - StatefulSets are the worst case. Because StatefulSets bind PVCs to specific pod identities, you cannot simply delete the PVC and recreate it. The volume lock must be cleared surgically.
How to Fix It
Step 0: Confirm the stuck VolumeAttachment
kubectl get volumeattachment
# Look for an attachment pointing to the dead node for your PV name
kubectl describe volumeattachment <attachment-name>
# Confirm: Attached: true, Attacher points to dead node
Basic Fix: Force-Delete the VolumeAttachment
# 1. Get the VolumeAttachment name tied to your PV
kubectl get volumeattachment -o json | \
jq '.items[] | select(.spec.source.persistentVolumeName=="pvc-a1b2c3d4") | .metadata.name'
# 2. Delete it — the controller will recreate it on the correct node
kubectl delete volumeattachment <attachment-name>
# 3. If delete hangs (finalizer is blocking), patch out the finalizer first
kubectl patch volumeattachment <attachment-name> \
-p '{"metadata":{"finalizers":[]}}' --type=merge
kubectl delete volumeattachment <attachment-name>
⚠️ Only do this after confirming the old node is truly offline or the volume is unmounted at the cloud provider level. Force-detaching a volume that is still being written to by a live process will corrupt the filesystem.
Enterprise Best Practice: Fix the Root Cause in the StorageClass
The real fix is preventing the 5-minute death spiral. Two levers:
1. Reduce node eviction timeout (cluster-level, via kube-controller-manager):
# kube-controller-manager flags
- --node-monitor-grace-period=40s
- --pod-eviction-timeout=5m0s
+ --node-monitor-grace-period=20s
+ --pod-eviction-timeout=30s
2. Fix the StorageClass to use WaitForFirstConsumer and enable volume expansion:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast-ssd
provisioner: ebs.csi.aws.com
parameters:
type: gp3
- reclaimPolicy: Delete
- volumeBindingMode: Immediate
+ reclaimPolicy: Retain
+ volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
WaitForFirstConsumer prevents the volume from being provisioned in the wrong AZ, which is the second most common cause of this error (volume in us-east-1a, node rescheduled to us-east-1b).
3. If your workload can tolerate shared access, migrate to ReadWriteMany:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: shared-data
spec:
- accessModes: ["ReadWriteOnce"]
+ accessModes: ["ReadWriteMany"]
storageClassName: efs-sc # AWS EFS CSI or Azure Files — not EBS
resources:
requests:
storage: 20Gi
Note: EBS, GCE PD, and Azure Disk cannot do ReadWriteMany. You must switch to EFS, GCS Filestore, Azure Files, or NFS-backed storage.
💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.
Prevention in CI/CD
1. OPA/Gatekeeper: Block ReadWriteOnce on multi-replica Deployments
package kubernetes.admission
deny[msg] {
input.request.kind.kind == "Deployment"
replicas := input.request.object.spec.replicas
replicas > 1
volume := input.request.object.spec.template.spec.volumes[_]
pvc := volume.persistentVolumeClaim
# Flag for manual review — teams must explicitly use RWX for multi-replica
msg := sprintf("Deployment '%v' has %v replicas with a PVC. Verify PVC accessMode is ReadWriteMany.", [input.request.object.metadata.name, replicas])
}
2. Checkov: Scan StorageClass for Immediate binding mode
# .checkov.yaml
checks:
- CKV2_K8S_STORAGECLASS_BINDING # flags volumeBindingMode: Immediate
If Checkov doesn't have this built-in for your version, add a custom check:
from checkov.common.models.enums import CheckResult
from checkov.kubernetes.checks.resource.base_resource_check import BaseK8Check
class StorageClassBindingMode(BaseK8Check):
def __init__(self):
super().__init__(name="Ensure StorageClass uses WaitForFirstConsumer",
check_id="CKV_K8S_CUSTOM_SC_BINDING",
supported_entities=['StorageClass'])
def scan_resource_conf(self, conf):
if conf.get('volumeBindingMode') == 'WaitForFirstConsumer':
return CheckResult.PASSED
return CheckResult.FAILED
3. Terraform: Enforce binding mode at provisioning time
resource "kubernetes_storage_class" "fast" {
metadata { name = "fast-ssd" }
storage_provisioner = "ebs.csi.aws.com"
- volume_binding_mode = "Immediate"
+ volume_binding_mode = "WaitForFirstConsumer"
reclaim_policy = "Retain"
}
4. Alert before the outage: PagerDuty/Alertmanager rule
- alert: VolumeAttachmentStuck
expr: kube_volumeattachment_info{attached="true"} * on(node) (kube_node_status_condition{condition="Ready",status="true"} == 0) > 0
for: 2m
labels:
severity: critical
annotations:
summary: "VolumeAttachment stuck on NotReady node {{ $labels.node }}"
runbook_url: "https://your-wiki/runbooks/volume-multi-attach"
This fires before the pod reschedule fails, giving you a 2-minute window to intervene or let automation handle the VolumeAttachment deletion.