Initializing Enclave...

How to Fix PVC Resize Stuck in FileSystemResizePending: Volume Expansion Requires Node Restart

Threat/Impact Level: HIGH | Exploitability/Downtime Risk: HIGH | Time to Fix: 10–20 mins

TL;DR

  • What broke: Kubernetes expanded the underlying block volume at the storage layer, but the filesystem inside the PVC was never resized because the pod consuming it was never restarted — the node-level resize agent (kubelet) only triggers resize2fs/xfs_growfs when the volume is remounted.
  • How to fix it: Delete and reschedule the consuming pod (or perform a rolling restart of the workload) so kubelet remounts the volume and completes the filesystem expansion.
  • Use our Client-Side Sandbox above to paste your PVC YAML and StorageClass manifest — it will auto-detect the missing allowVolumeExpansion: true flag and generate the corrected rollout patch.

The Incident (What Does the Error Mean?)

You ran kubectl get pvc and saw this:

NAME         STATUS   VOLUME           CAPACITY   ACCESS MODES   STORAGECLASS   AGE
data-pvc-0   Bound    pvc-a1b2c3d4     20Gi       RWO            gp2            14d

But you patched it to 50Gi. Now:

NAME         STATUS                    VOLUME           CAPACITY   ACCESS MODES   STORAGECLASS   AGE
data-pvc-0   Bound                     pvc-a1b2c3d4     20Gi       RWO            gp2            14d

Conditions:
  Type                             Status
  FileSystemResizePending          True   — Waiting for user to (re)start a pod to finish file system resize of volume on node.

Immediate consequence: The block device is already 50Gi at the cloud provider level (EBS, GCE PD, Azure Disk). The pod still sees 20Gi. Any write that would have fit in 50Gi fails. Stateful workloads — Postgres, Kafka, Elasticsearch — will crash with ENOSPC or hang on write.


The Attack Vector / Blast Radius

This is a two-phase resize and Kubernetes only completed phase 1.

Phase 1 — Control plane / cloud API: external-resizer CSI sidecar calls the cloud API (ModifyVolume on EBS, etc.). Block device is now 50Gi. PV object is updated. ✅ Done.

Phase 2 — Node / filesystem: kubelet must detect the mounted volume has a larger block device, then call resize2fs (ext4) or xfs_growfs (XFS) inside the pod's mount namespace. This only happens on pod restart / remount. If the pod is still running, kubelet never gets the trigger.

Blast radius:

  • Stateful sets with volumeClaimTemplates will have every replica in this state — one bad rollout and all replicas are capacity-starved simultaneously.
  • If allowVolumeExpansion: false on the StorageClass, the PVC patch was silently accepted by the API server but the CSI driver will reject it — leaving the PVC in a permanently inconsistent desired vs. actual state.
  • On ReadWriteOnce volumes, the block device is node-locked. A pod rescheduled to a different node will force a detach/reattach, which can take 6–10 minutes on AWS EBS (the multi-attach error window).

How to Fix It

Step 0 — Verify StorageClass has expansion enabled

 apiVersion: storage.k8s.io/v1
 kind: StorageClass
 metadata:
   name: gp2
 provisioner: ebs.csi.aws.com
 parameters:
   type: gp2
-# allowVolumeExpansion not set (defaults to false)
+allowVolumeExpansion: true

If this flag is missing, patch it:

kubectl patch storageclass gp2 -p '{"allowVolumeExpansion": true}'

⚠️ Patching an existing StorageClass only affects future resize requests. If the PVC resize was already rejected, re-apply the PVC patch after fixing the StorageClass.


Basic Fix — Force pod remount by restarting the workload

For a Deployment:

kubectl rollout restart deployment/<your-deployment> -n <namespace>

For a StatefulSet (do this carefully — one pod at a time):

kubectl rollout restart statefulset/<your-statefulset> -n <namespace>

Verify kubelet completed phase 2:

kubectl describe pvc data-pvc-0 -n <namespace>
# Conditions block should be empty (FileSystemResizePending gone)

kubectl exec -it <pod> -- df -h /data
# Should now show 50G

Enterprise Best Practice — Zero-downtime resize for StatefulSets

For production StatefulSets where you cannot afford a full rollout restart:

# 1. Cordon the node the pod is on (prevents rescheduling chaos)
+kubectl cordon <node-name>

# 2. Delete only the specific pod — StatefulSet controller reschedules it
+kubectl delete pod <statefulset-name>-0 -n <namespace>

# 3. Watch kubelet complete fs resize on pod startup
+kubectl get events -n <namespace> --field-selector reason=FileSystemResizeSuccessful -w

# 4. Uncordon
+kubectl uncordon <node-name>

For CSI drivers that support online filesystem resize (e.g., ebs.csi.aws.com >= v1.11):

 apiVersion: storage.k8s.io/v1
 kind: StorageClass
 metadata:
   name: gp3-online-resize
 provisioner: ebs.csi.aws.com
 parameters:
   type: gp3
 allowVolumeExpansion: true
+# CSI driver annotation to attempt online resize without pod restart
+# Only works if kernel supports it AND volume is ext4/xfs on Linux >= 5.4
 annotations:
+  ebs.csi.aws.com/fs-resize-online: "true"

Note: Online resize without pod restart is driver and kernel version dependent. Do not rely on it for critical workloads without testing.


💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.


Prevention in CI/CD

1. OPA/Gatekeeper — Enforce allowVolumeExpansion: true on all StorageClasses

package storageclass

deny[msg] {
  input.kind == "StorageClass"
  not input.allowVolumeExpansion == true
  msg := sprintf("StorageClass '%v' must have allowVolumeExpansion: true", [input.metadata.name])
}

2. Checkov — Block non-expandable StorageClass in Terraform

checkov -d ./terraform --check CKV_K8S_STORAGE_ALLOW_EXPANSION

If using the Kubernetes Terraform provider:

 resource "kubernetes_storage_class" "gp3" {
   metadata { name = "gp3" }
   storage_provisioner = "ebs.csi.aws.com"
+  allow_volume_expansion = true
   parameters = { type = "gp3" }
 }

3. Helm pre-upgrade hook — Validate PVC resize will succeed before rollout

apiVersion: batch/v1
kind: Job
metadata:
  name: pvc-resize-preflight
  annotations:
    "helm.sh/hook": pre-upgrade
    "helm.sh/hook-delete-policy": hook-succeeded
spec:
  template:
    spec:
      containers:
      - name: checker
        image: bitnami/kubectl:latest
        command:
        - /bin/sh
        - -c
        - |
          SC=$(kubectl get pvc $PVC_NAME -o jsonpath='{.spec.storageClassName}')
          EXPAND=$(kubectl get sc $SC -o jsonpath='{.allowVolumeExpansion}')
          [ "$EXPAND" = "true" ] || (echo "StorageClass $SC does not allow expansion. Aborting."; exit 1)

4. AlertManager rule — Alert on FileSystemResizePending before it causes ENOSPC

- alert: PVCFilesystemResizePending
  expr: kube_persistentvolumeclaim_status_condition{condition="FileSystemResizePending",status="true"} == 1
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "PVC {{ $labels.persistentvolumeclaim }} filesystem resize pending — pod restart required"

Related Diagnostics

"Part of the Performance Utility Matrix."

View all 219 Performance Tools →