How to Fix EBS CSI PVC Stuck in Pending: 'Waiting for a Volume to Be Created by External Provisioner'
Threat/Impact Level: HIGH | Downtime Risk: HIGH | Time to Fix: 10–30 mins depending on root cause
TL;DR
- What broke: Your PVC is stuck in
Pendingbecause the EBS CSI external provisioner cannot fulfill the volume request — caused by a missing/misconfigured CSI driver, wrongprovisionerfield in the StorageClass, IAM role missingec2:CreateVolumepermissions, or an AZ topology mismatch. - How to fix it: Verify the
ebs.csi.aws.comdriver is running, confirm IRSA/node IAM permissions, and ensure your StorageClass references the correct provisioner and AZ topology constraints. - Shortcut: Use our Client-Side Sandbox above to auto-refactor this — paste your
kubectl describe pvcoutput and StorageClass YAML and get a corrected manifest instantly.
The Incident (What Does the Error Mean?)
You run kubectl describe pvc <your-pvc> and see:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning ProvisioningFailed 2m (x12 over 10m) persistentvolume-controller waiting for a volume to be created, either by external provisioner "ebs.csi.aws.com" or manually created
Immediate consequence: Every pod with a volumeMount referencing this PVC is stuck in Pending. Deployments, StatefulSets, and Jobs are fully blocked. If this is a database StatefulSet (Postgres, MySQL, Kafka), your application is down. The control plane is not broken — the provisioner simply never received a valid, actionable request or lacked the authority to execute it.
The Attack Vector / Blast Radius
This is a silent availability failure. The cluster appears healthy — nodes are Ready, the API server responds — but workloads never start. The blast radius depends on what was deploying:
- StatefulSets: All replicas stuck. No leader elected. Quorum-based systems (etcd, Kafka, ZooKeeper) lose write availability entirely.
- Rollouts: A deployment rollout stalls. The old ReplicaSet keeps serving, but if it was already scaled down or this is a fresh deploy, you have zero running pods.
- Cascading HPA failure: HPA cannot scale what never started. Traffic spikes hit nothing.
The five root causes, in order of frequency:
- CSI driver not installed or pods are CrashLooping —
ebs-csi-controllerpods inkube-systemare notRunning. - Wrong provisioner name in StorageClass — legacy
kubernetes.io/aws-ebsinstead ofebs.csi.aws.com. - IRSA not configured / IAM policy missing — the CSI controller pod cannot call
ec2:CreateVolume,ec2:DescribeVolumes,ec2:AttachVolume. - Availability Zone topology mismatch — PVC requests a volume in
us-east-1abut no schedulable node exists there. - StorageClass does not exist or is named incorrectly in the PVC spec.
How to Fix It (The Solution)
Step 0 — Triage in 60 Seconds
# Is the CSI driver running?
kubectl get pods -n kube-system -l app=ebs-csi-controller
# What is the provisioner on your StorageClass?
kubectl get sc <your-storage-class> -o jsonpath='{.provisioner}'
# Full event log on the stuck PVC
kubectl describe pvc <pvc-name> -n <namespace>
# CSI controller logs
kubectl logs -n kube-system -l app=ebs-csi-controller -c csi-provisioner --tail=50
Fix 1 — Wrong Provisioner in StorageClass (Most Common)
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: gp3-ebs
- provisioner: kubernetes.io/aws-ebs
+ provisioner: ebs.csi.aws.com
parameters:
- type: gp2
+ type: gp3
+ encrypted: "true"
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
Critical:
volumeBindingMode: WaitForFirstConsumeris mandatory for EBS. Without it, the provisioner tries to create the volume before knowing which AZ the pod will schedule into, causing topology failures.
Fix 2 — Missing IAM Permissions (IRSA)
The CSI controller pod must assume a role with at minimum:
# IAM Policy attached to the CSI controller IRSA role
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
- "ec2:*"
+ "ec2:CreateVolume",
+ "ec2:DeleteVolume",
+ "ec2:AttachVolume",
+ "ec2:DetachVolume",
+ "ec2:DescribeVolumes",
+ "ec2:DescribeVolumeStatus",
+ "ec2:DescribeInstances",
+ "ec2:DescribeAvailabilityZones",
+ "ec2:CreateTags",
+ "ec2:ModifyVolume",
+ "ec2:DescribeVolumesModifications"
],
"Resource": "*"
}
]
}
Annotate the CSI service account:
kubectl annotate serviceaccount ebs-csi-controller-sa \
-n kube-system \
eks.amazonaws.com/role-arn=arn:aws:iam::<ACCOUNT_ID>:role/AmazonEKS_EBS_CSI_DriverRole
Then restart the controller:
kubectl rollout restart deployment ebs-csi-controller -n kube-system
Fix 3 — Install the EBS CSI Driver (If Missing)
Enterprise Best Practice — EKS Add-on (not Helm):
# Terraform
resource "aws_eks_addon" "ebs_csi" {
cluster_name = aws_eks_cluster.main.name
addon_name = "aws-ebs-csi-driver"
- # Not configured — driver missing entirely
+ addon_version = "v1.30.0-eksbuild.1"
+ service_account_role_arn = aws_iam_role.ebs_csi_irsa.arn
+ resolve_conflicts_on_update = "OVERWRITE"
}
Fix 4 — PVC Referencing Non-Existent StorageClass
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: postgres-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
- storageClassName: standard
+ storageClassName: gp3-ebs
💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.
Prevention in CI/CD
OPA/Gatekeeper Policy — Enforce Correct Provisioner
package k8s.storageclass
violation[{"msg": msg}] {
input.review.object.kind == "StorageClass"
input.review.object.provisioner != "ebs.csi.aws.com"
msg := sprintf("StorageClass '%v' must use provisioner 'ebs.csi.aws.com', got '%v'",
[input.review.object.metadata.name, input.review.object.provisioner])
}
violation[{"msg": msg}] {
input.review.object.kind == "StorageClass"
input.review.object.volumeBindingMode != "WaitForFirstConsumer"
msg := "StorageClass must set volumeBindingMode: WaitForFirstConsumer for EBS"
}
Checkov in CI Pipeline
# .github/workflows/checkov.yml
- name: Scan Kubernetes manifests
uses: bridgecrewio/checkov-action@master
with:
directory: k8s/
framework: kubernetes
check: CKV_K8S_28 # Ensure StorageClass uses approved provisioner
soft_fail: false
Terraform Pre-Commit
# .pre-commit-config.yaml
repos:
- repo: https://github.com/antonbabenko/pre-commit-terraform
hooks:
- id: terraform_validate
- id: terraform_tflint
Add to your tflint rules: assert aws_eks_addon for aws-ebs-csi-driver exists before any StatefulSet workload module is applied.
Smoke Test Post-Deploy
#!/bin/bash
# ebs-smoke-test.sh — run this in your CD pipeline after cluster provisioning
kubectl apply -f - <<EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: ebs-smoke-pvc
namespace: default
spec:
accessModes: [ReadWriteOnce]
storageClassName: gp3-ebs
resources:
requests:
storage: 1Gi
EOF
sleep 30
STATUS=$(kubectl get pvc ebs-smoke-pvc -o jsonpath='{.status.phase}')
if [ "$STATUS" != "Bound" ]; then
echo "FATAL: EBS CSI smoke test failed. PVC status: $STATUS"
kubectl describe pvc ebs-smoke-pvc
exit 1
fi
kubectl delete pvc ebs-smoke-pvc
echo "EBS CSI provisioner: OK"