Fixing GKE Workload Identity 'Insufficient Permissions' Errors: A Complete Debugging Guide
Threat/Impact Level: HIGH | Downtime Risk: HIGH | Time to Fix: 10–20 mins
TL;DR
- What broke: The Kubernetes Service Account (KSA) is not correctly bound to a Google Service Account (GSA) — either the
iam.workloadIdentityUserIAM binding is missing, the KSA annotation is wrong, or the node pool never had Workload Identity enabled. - How to fix it: Verify the node pool has
--workload-poolset, annotate the KSA with the correct GSA email, and grantroles/iam.workloadIdentityUserto the GSA with the correct member principal. - Fast path: Use our Client-Side Sandbox below to auto-refactor this — paste your KSA YAML and IAM policy, and get the corrected binding generated locally without leaking your project IDs.
The Incident (What Does the Error Mean?)
Your pod logs or application startup shows something like:
ERROR: (gcloud) There was a problem refreshing your current auth tokens:
Request had insufficient authentication scopes.
-- or --
google.api_core.exceptions.PermissionDenied: 403 Permission 'storage.objects.get'
denied on resource 'projects/_/buckets/my-bucket/objects/config.json'
for service account: [PROJECT_NUMBER][email protected]
-- or --
Failed to retrieve access token: metadata server returned HTTP 403
The immediate consequence: Your pod is either falling back to the Compute Engine default service account (which has broad, unintended permissions) or getting no credentials at all. Either way, the workload is broken and potentially operating with the wrong identity — a security event, not just an outage.
The Attack Vector / Blast Radius
This misconfiguration has two failure modes, both dangerous:
Mode 1 — Silent fallback to default Compute SA. If Workload Identity is not enforced at the node pool level, the pod silently authenticates as [PROJECT_NUMBER][email protected]. That SA typically has Editor on the project. An attacker who compromises the pod now has near-full project access: read all GCS buckets, invoke Cloud Functions, pull secrets from Secret Manager, enumerate IAM policies.
Mode 2 — Hard 403, workload down. If Workload Identity IS enforced but the binding is broken, the metadata server returns 403. The pod cannot authenticate to any Google API. Cascading failures: Cloud SQL connections drop, Pub/Sub consumers stall, any GCS-backed config fails to load. In a microservices mesh, this propagates upstream within seconds.
The binding chain that must be complete — every link:
Node Pool → workload-pool=[PROJECT_ID].svc.id.goog
└── KSA annotation → iam.gke.io/gcp-service-account=[GSA_EMAIL]
└── IAM binding → serviceAccount:[PROJECT_ID].svc.id.goog[NAMESPACE/KSA_NAME]
└── role → roles/iam.workloadIdentityUser ON [GSA]
Break any single link and the whole chain fails.
How to Fix It (The Solution)
Step 0 — Verify the node pool actually has Workload Identity enabled
gcloud container node-pools describe [POOL_NAME] \
--cluster=[CLUSTER_NAME] \
--region=[REGION] \
--format="value(config.workloadMetadataConfig.mode)"
# Must return: GKE_METADATA
# If it returns EXPOSE_SEND_METADATA or nothing, Workload Identity is OFF at the node level.
Enable it (rolling node upgrade — expect brief disruption):
gcloud container node-pools update [POOL_NAME] \
--cluster=[CLUSTER_NAME] \
--region=[REGION] \
--workload-metadata=GKE_METADATA
Basic Fix — Repair the KSA annotation and IAM binding
1. Annotate the Kubernetes Service Account:
apiVersion: v1
kind: ServiceAccount
metadata:
name: my-app-ksa
namespace: production
- annotations: {}
+ annotations:
+ iam.gke.io/gcp-service-account: my-app-gsa@[PROJECT_ID].iam.gserviceaccount.com
2. Grant the IAM binding on the GSA:
-# Missing binding — the GSA has no workloadIdentityUser grant
+gcloud iam service-accounts add-iam-policy-binding \
+ my-app-gsa@[PROJECT_ID].iam.gserviceaccount.com \
+ --role="roles/iam.workloadIdentityUser" \
+ --member="serviceAccount:[PROJECT_ID].svc.id.goog[production/my-app-ksa]"
IAM propagation takes up to 60 seconds. Restart the pod after applying.
Enterprise Best Practice — Terraform with explicit Workload Identity enforcement
resource "google_container_node_pool" "app_nodes" {
name = "app-pool"
cluster = google_container_cluster.primary.name
node_config {
- workload_metadata_config {
- mode = "EXPOSE_SEND_METADATA"
- }
+ workload_metadata_config {
+ mode = "GKE_METADATA"
+ }
# Scope down the node SA — it should have zero project permissions
+ service_account = google_service_account.node_sa.email
+ oauth_scopes = ["https://www.googleapis.com/auth/cloud-platform"]
}
}
resource "google_service_account_iam_member" "workload_identity_binding" {
service_account_id = google_service_account.app_gsa.name
role = "roles/iam.workloadIdentityUser"
- member = "serviceAccount:[PROJECT_ID].svc.id.goog[default/my-app-ksa]"
+ member = "serviceAccount:${var.project_id}.svc.id.goog[${var.namespace}/${var.ksa_name}]"
}
Verification command — run this after every change:
# Impersonate the KSA from inside a debug pod in the same namespace
kubectl run wi-test --image=google/cloud-sdk:slim \
--serviceaccount=my-app-ksa \
--namespace=production \
--rm -it --restart=Never -- \
gcloud auth print-identity-token
# A valid JWT = binding is working. A 403 = still broken.
💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.
Prevention in CI/CD
1. Conftest / OPA — block KSA deployments missing the annotation:
package kubernetes.workload_identity
deny[msg] {
input.kind == "ServiceAccount"
not input.metadata.annotations["iam.gke.io/gcp-service-account"]
msg := sprintf("ServiceAccount '%v' in namespace '%v' is missing Workload Identity annotation.",
[input.metadata.name, input.metadata.namespace])
}
2. Checkov — scan Terraform before terraform apply:
checkov -d . --check CKV_GCP_69 # Checks GKE_METADATA mode on node pools
checkov -d . --check CKV_GCP_24 # Checks node SA is not the default compute SA
3. GitHub Actions gate — enforce in PR pipeline:
- name: Checkov Workload Identity Scan
uses: bridgecrewio/checkov-action@master
with:
directory: ./terraform
check: CKV_GCP_69,CKV_GCP_24
soft_fail: false # Hard block on merge
4. Audit existing clusters on a schedule:
# Find all KSAs without the WI annotation across all namespaces
kubectl get serviceaccounts --all-namespaces -o json | \
jq '.items[] | select(.metadata.annotations["iam.gke.io/gcp-service-account"] == null) | \
"\(.metadata.namespace)/\(.metadata.name)"'
Pipe this into a weekly Cloud Scheduler → Cloud Function alert. Any unbound KSA in a production namespace should page the on-call.