Initializing Enclave...

How to Fix StatefulSet Pod Volume Mount Failures After Scaling Down and Up in Kubernetes

Threat/Impact Level: HIGH | Downtime Risk: HIGH | Time to Fix: 15–45 mins

TL;DR

  • What broke: After scaling a StatefulSet to 0 and back up, the pod is stuck in ContainerCreating because its PVC is orphaned, the VolumeAttachment object is stale, or the volume is still attached to the previous node.
  • How to fix it: Force-delete the stale VolumeAttachment, verify the PVC is Bound, and ensure your volumeClaimTemplates name exactly matches the volumeMounts name in the pod spec.
  • Shortcut: Use our Client-Side Sandbox below to auto-refactor your StatefulSet YAML and surface the exact field mismatch.

The Incident (What Does the Error Mean?)

Raw event output from kubectl describe pod <pod-name>:

Warning  FailedMount  4m    kubelet  MountVolume.SetUp failed for volume "data" :
         rpc error: code = Internal desc = volume data is not mounted
Warning  FailedMount  2m    kubelet  Unable to attach or mount volumes: unmounted volumes=[data],
         unattached volumes=[data]: timed out waiting for the condition

And from kubectl get events -n <namespace>:

FailedAttachVolume  Multi-Attach error for volume "pvc-xxxxxxxx" :
  Volume is already exclusively attached to one node and can't be attached to another

Immediate consequence: The pod is permanently stuck in ContainerCreating. No replicas are serving traffic. If this is a database StatefulSet (Postgres, Kafka, Cassandra), you have a full data-tier outage.


The Attack Vector / Blast Radius

This is not a transient hiccup. Here is the exact failure cascade:

  1. Scale-down detaches the pod but the cloud provider's VolumeAttachment object (kubectl get volumeattachment) is not garbage-collected — common with ReclaimPolicy: Retain and EBS/GCE PD volumes that use WaitForFirstConsumer binding mode.
  2. Scale-up schedules the new pod, potentially on a different node. The CSI driver attempts attachment. The cloud API rejects it: the volume is still registered as attached to the old node's instance ID.
  3. The kubelet's MountVolume call times out. The pod never starts. Kubernetes retries indefinitely with exponential backoff — it will not self-heal.
  4. Secondary blast: If podManagementPolicy: Parallel is set, all replicas may fail simultaneously, not just the scaled replica.
  5. Tertiary blast: A volumeClaimTemplate name mismatch (e.g., template name data vs. mount name data-volume) causes a silent bind failure — the PVC exists but is never mounted. This is the most common misconfiguration and is invisible in kubectl get pvc.

How to Fix It

Step 1: Diagnose the Actual State

# Check PVC status — must be Bound, not Released or Pending
kubectl get pvc -n <namespace> -l app=<statefulset-name>

# Check for stale VolumeAttachment objects
kubectl get volumeattachment | grep <pv-name>

# Get the PV name from the PVC
kubectl get pvc <pvc-name> -n <namespace> -o jsonpath='{.spec.volumeName}'

# Describe the pod for exact mount error
kubectl describe pod <pod-name> -n <namespace> | grep -A 20 "Events"

Basic Fix: Delete the Stale VolumeAttachment

# Identify the stale attachment
kubectl get volumeattachment

# Force delete it — the CSI driver will re-create it correctly on next attach
kubectl delete volumeattachment <attachment-name>

# If stuck in Terminating, patch out the finalizer
kubectl patch volumeattachment <attachment-name> \
  -p '{"metadata":{"finalizers":null}}' --type=merge

After deletion, the pod's next attach cycle will succeed within 60–90 seconds.


Enterprise Best Practice: Fix the Root Config

The most common permanent cause is a name mismatch between volumeClaimTemplates and volumeMounts, or a missing storageClassName causing WaitForFirstConsumer to deadlock.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
spec:
  serviceName: "postgres"
  replicas: 3
  podManagementPolicy: OrderedReady
  template:
    spec:
      containers:
      - name: postgres
        image: postgres:15
        volumeMounts:
-         - name: data-volume        # WRONG: does not match volumeClaimTemplate name
+         - name: data               # CORRECT: must exactly match .volumeClaimTemplates[].metadata.name
            mountPath: /var/lib/postgresql/data
  volumeClaimTemplates:
  - metadata:
-     name: data-volume             # WRONG: mismatch causes silent mount failure
+     name: data                    # CORRECT
    spec:
-     storageClassName: ""          # WRONG: empty string disables dynamic provisioning
+     storageClassName: "gp3-csi"   # CORRECT: explicit class with WaitForFirstConsumer
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
-         storage: 1Gi              # WRONG: undersized, causes resize churn
+         storage: 20Gi

For EBS/GCE PD — prevent Multi-Attach errors with node affinity enforcement:

# StorageClass definition
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: gp3-csi
provisioner: ebs.csi.aws.com
- volumeBindingMode: Immediate        # WRONG: binds before pod is scheduled, causes node mismatch
+ volumeBindingMode: WaitForFirstConsumer  # CORRECT: binds to the node the pod lands on
  reclaimPolicy: Retain
  parameters:
    type: gp3
+   encrypted: "true"

💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.


Prevention in CI/CD

1. OPA/Gatekeeper Policy — Enforce volumeClaimTemplate Name Consistency

# opa/statefulset-volume-name-match.rego
package kubernetes.admission

deny[msg] {
  input.request.kind.kind == "StatefulSet"
  container := input.request.object.spec.template.spec.containers[_]
  mount := container.volumeMounts[_]
  claim_names := {t.metadata.name | t := input.request.object.spec.volumeClaimTemplates[_]}
  not claim_names[mount.name]
  msg := sprintf("volumeMount '%v' has no matching volumeClaimTemplate", [mount.name])
}

2. Checkov Static Scan in CI

# .github/workflows/k8s-lint.yml
- name: Checkov StatefulSet Scan
  uses: bridgecrewio/checkov-action@master
  with:
    directory: ./k8s/
    check: CKV_K8S_28,CKV_K8S_6
    framework: kubernetes
    soft_fail: false

3. Pre-Scale Hook — Verify VolumeAttachment Cleanup

#!/bin/bash
# pre-scale-down.sh — run before kubectl scale --replicas=0
PV_NAME=$(kubectl get pvc data-postgres-0 -o jsonpath='{.spec.volumeName}')
ATTACHMENT=$(kubectl get volumeattachment -o json | \
  jq -r ".items[] | select(.spec.source.persistentVolumeName==\"$PV_NAME\") | .metadata.name")

if [ -n "$ATTACHMENT" ]; then
  echo "WARNING: VolumeAttachment $ATTACHMENT exists. Deleting before scale-down."
  kubectl delete volumeattachment "$ATTACHMENT"
  sleep 10
fi

kubectl scale statefulset postgres --replicas=0

4. Monitoring Alert (Prometheus)

# Alert fires if any pod is stuck in ContainerCreating > 5 minutes
- alert: StatefulSetVolumeMountStuck
  expr: |
    kube_pod_container_status_waiting_reason{reason="ContainerCreating"} == 1
    and on(pod) kube_pod_owner{owner_kind="StatefulSet"}
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: "StatefulSet pod {{ $labels.pod }} stuck mounting volume"
    runbook: "https://your-wiki/runbooks/statefulset-volume-mount"

Related Diagnostics

"Part of the Performance Utility Matrix."

View all 219 Performance Tools →