Fixing Velero 'Failed to Get Backup Storage Location' S3 IAM Permission Error
Threat/Impact Level: HIGH | Downtime Risk: HIGH | Time to Fix: 10–20 mins
TL;DR
- What broke: Velero's
BackupStorageLocationcontroller cannot calls3:GetBucketLocation,s3:ListBucket, or related actions because the IAM principal (role or user) attached to the Velero service account is missing required permissions or scoped to the wrong resource ARN. - How to fix it: Attach a least-privilege IAM policy granting the exact S3 actions Velero requires, scoped to your specific bucket ARN — not
*. - Shortcut: Use our Client-Side Sandbox above to auto-refactor your broken IAM policy or BSL manifest without leaking your ARNs to a third-party server.
The Incident (What Does the Error Mean?)
You will see one or more of the following in velero pod logs or in kubectl get backupstoragelocations -n velero:
time="2024-01-15T03:12:44Z" level=error msg="failed to get backup storage location"
error="rpc error: code = Unknown desc = RequestError: send request failed\ncaused by: AccessDenied: Access Denied\n\tstatus code: 403, request id: A1B2C3D4E5F6"
BackupStorageLocation "default" — Phase: Unavailable
Immediate consequence: Every scheduled and on-demand backup fails silently. velero backup get shows Failed or the BSL shows Unavailable. Your cluster is running without a valid recovery point. If a node failure or namespace wipe happens right now, you have no restore target.
The Attack Vector / Blast Radius
This is a misconfiguration that creates two simultaneous risks:
1. Operational: Zero backup coverage. Velero's BSL validation runs on a reconciliation loop. One 403 poisons the entire location as Unavailable. All backups — including those for stateful workloads like Postgres, Kafka, and Elasticsearch PVCs — are silently skipped. No alert fires unless you have explicit BSL phase monitoring.
2. Security: Overly permissive fixes introduce lateral movement risk. The knee-jerk fix is attaching s3:* on arn:aws:s3:::*. This is catastrophic. A compromised Velero pod (via a malicious container image, a CVE in the Velero binary, or a stolen IRSA token) now has full read/write/delete access to every S3 bucket in the account — including CloudTrail logs, billing exports, and other backup buckets. An attacker can:
- Exfiltrate all backup archives (they contain etcd secrets, kubeconfig fragments, PVC data)
- Delete all backup objects, destroying your DR capability
- Overwrite backups with corrupted archives to poison future restores
Blast radius of s3:* on *: Total account-wide S3 compromise from a single Velero pod escape.
How to Fix It
Basic Fix — Attach the Correct Minimum IAM Policy
The following actions are required by Velero for S3 BSL operation. Scope them to your specific bucket.
- {
- "Effect": "Allow",
- "Action": "s3:*",
- "Resource": "*"
- }
+ {
+ "Version": "2012-10-17",
+ "Statement": [
+ {
+ "Effect": "Allow",
+ "Action": [
+ "s3:GetBucketLocation",
+ "s3:ListBucket",
+ "s3:ListBucketMultipartUploads"
+ ],
+ "Resource": "arn:aws:s3:::YOUR-VELERO-BUCKET"
+ },
+ {
+ "Effect": "Allow",
+ "Action": [
+ "s3:AbortMultipartUpload",
+ "s3:DeleteObject",
+ "s3:GetObject",
+ "s3:ListMultipartUploadParts",
+ "s3:PutObject"
+ ],
+ "Resource": "arn:aws:s3:::YOUR-VELERO-BUCKET/*"
+ }
+ ]
+ }
After applying, force a BSL re-validation:
kubectl patch backupstoragelocation default \
-n velero \
--type merge \
-p '{"spec":{"validationFrequency":"1m"}}'
# Watch until phase flips to Available
kubectl get backupstoragelocation -n velero -w
Enterprise Best Practice — IRSA with Condition Keys (EKS)
Do not use long-lived IAM user credentials (velero-credentials secret with aws_access_key_id). Use IAM Roles for Service Accounts (IRSA).
# IAM Trust Policy — WRONG: overly broad trust
- {
- "Effect": "Allow",
- "Principal": {"Federated": "arn:aws:iam::ACCOUNT:oidc-provider/oidc.eks.REGION.amazonaws.com/id/OIDCID"},
- "Action": "sts:AssumeRoleWithWebIdentity"
- }
# IAM Trust Policy — CORRECT: locked to Velero service account
+ {
+ "Effect": "Allow",
+ "Principal": {
+ "Federated": "arn:aws:iam::ACCOUNT:oidc-provider/oidc.eks.REGION.amazonaws.com/id/OIDCID"
+ },
+ "Action": "sts:AssumeRoleWithWebIdentity",
+ "Condition": {
+ "StringEquals": {
+ "oidc.eks.REGION.amazonaws.com/id/OIDCID:sub": "system:serviceaccount:velero:velero",
+ "oidc.eks.REGION.amazonaws.com/id/OIDCID:aud": "sts.amazonaws.com"
+ }
+ }
+ }
Annotate the Velero service account:
kubectl annotate serviceaccount velero \
-n velero \
eks.amazonaws.com/role-arn=arn:aws:iam::ACCOUNT:role/velero-irsa-role
Add --no-secret to your Velero install so it does not mount static credentials:
velero install \
--provider aws \
--plugins velero/velero-plugin-for-aws:v1.9.0 \
--bucket YOUR-VELERO-BUCKET \
--backup-location-config region=us-east-1 \
--no-secret \
--sa-annotations eks.amazonaws.com/role-arn=arn:aws:iam::ACCOUNT:role/velero-irsa-role
💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.
Prevention in CI/CD
1. Checkov — Scan IAM policies before terraform apply:
checkov -d ./terraform --check CKV_AWS_40,CKV_AWS_274
# CKV_AWS_40: IAM policy must not allow wildcard actions
# CKV_AWS_274: Disallow IAM policies with resource '*'
2. OPA/Conftest — Enforce no s3:* in Velero namespace policies:
package velero.iam
deny[msg] {
action := input.Statement[_].Action
action == "s3:*"
msg := "Velero IAM policy must not use wildcard s3:* action"
}
deny[msg] {
resource := input.Statement[_].Resource
resource == "*"
msg := "Velero IAM policy must scope Resource to specific bucket ARN"
}
3. Terraform aws_iam_policy validation block:
- resource "aws_iam_policy" "velero" {
- policy = jsonencode({
- Statement = [{ Effect = "Allow", Action = "s3:*", Resource = "*" }]
- })
- }
+ resource "aws_iam_policy" "velero" {
+ policy = data.aws_iam_policy_document.velero_s3.json
+ }
+
+ data "aws_iam_policy_document" "velero_s3" {
+ statement {
+ actions = ["s3:GetBucketLocation", "s3:ListBucket", "s3:ListBucketMultipartUploads"]
+ resources = ["arn:aws:s3:::${var.velero_bucket_name}"]
+ }
+ statement {
+ actions = ["s3:GetObject", "s3:PutObject", "s3:DeleteObject", "s3:AbortMultipartUpload", "s3:ListMultipartUploadParts"]
+ resources = ["arn:aws:s3:::${var.velero_bucket_name}/*"]
+ }
+ }
4. AlertManager rule — fire immediately when BSL goes Unavailable:
- alert: VeleroBSLUnavailable
expr: velero_backup_storage_location_info{phase!="Available"} == 1
for: 5m
labels:
severity: critical
annotations:
summary: "Velero BSL {{ $labels.backup_storage_location }} is {{ $labels.phase }}"
runbook: "https://your-wiki/velero-bsl-iam-fix"
This fires before the next scheduled backup window, giving you time to fix IAM before you lose a recovery point.