Fixing GKE Autopilot Pods Stuck in Pending: Insufficient CPU & Node Not-Ready Taint
Threat/Impact Level: HIGH | Downtime Risk: HIGH | Time to Fix: 5–15 mins
TL;DR
- What broke: Your pod's CPU
requestsexceed available allocatable CPU on the sole ready node, and the only other node is taintednode.kubernetes.io/not-ready, meaning the scheduler has zero valid targets. - How to fix it: Right-size your CPU requests to match Autopilot's predefined resource tiers, or add a
tolerationsblock if you intentionally need to schedule onto not-ready nodes during warm-up (rare/advanced). - Fast path: Use our Client-Side Sandbox above to auto-refactor your pod spec — paste your YAML, get corrected
resourcesblocks instantly without leaking your config to a third party.
The Incident (What Does the Error Mean?)
Raw scheduler event:
0/1 nodes are available:
1 Insufficient cpu,
1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
The Kubernetes scheduler evaluated every node in the cluster and found no viable placement target:
Insufficient cpu— The sum of CPUrequestsacross all pods already scheduled on the ready node, plus your new pod's request, exceeds the node's allocatable CPU. In Autopilot, you don't manage node pools directly, but Autopilot still enforces minimum and maximum resource request tiers. If your request is malformed (e.g.,requests.cpu: 0or an absurdly high value like32), Autopilot either rejects it or cannot find a matching machine shape fast enough.node.kubernetes.io/not-ready— A second node exists but is mid-provisioning or unhealthy. Autopilot is likely spinning up a new node in response to the pending pod, but it hasn't passed readiness checks yet. Your pod spec has notolerationfor this taint, so the scheduler won't place it there even temporarily.
Immediate consequence: The pod stays in Pending indefinitely if Autopilot provisioning stalls (e.g., quota exhaustion, region capacity, or a misconfigured request that no Autopilot tier can satisfy).
The Attack Vector / Blast Radius
This is a cascading availability failure, not a security exploit, but the blast radius is severe in production:
- Single-replica deployments go fully dark. If this is your only pod for a service, it's a complete outage the moment the previous pod terminates.
- HPA death spiral. If a Horizontal Pod Autoscaler is active, it may attempt to scale out, creating more Pending pods, which triggers more Autopilot node provisioning requests, which exhausts your GCP project's
CPUSquota faster — turning a scheduling hiccup into a quota incident. - Autopilot provisioning stall. Autopilot will not provision a node for a CPU request it cannot map to a valid Compute Class. A request of
cpu: 0or a non-standard value like1500mwith a memory ratio that violates Autopilot's constraints will leave the pod Pending forever — no new node will ever appear. - Readiness probe failures upstream. Any service depending on this pod (via
ClusterIPorIngress) starts returning 502/503 as soon as the endpoint is deregistered.
How to Fix It (The Solution)
Basic Fix — Right-Size CPU Requests for Autopilot
Autopilot enforces a minimum of 250m CPU per container and requires CPU/memory ratios within specific bounds. The most common cause of Insufficient cpu on Autopilot is either an omitted requests block (defaults to 0, which Autopilot mutates but may conflict) or a value that doesn't align with available node shapes.
Check your current pod spec:
kubectl describe pod <pod-name> -n <namespace>
kubectl get events -n <namespace> --sort-by='.lastTimestamp'
Check Autopilot's mutation of your requests:
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].resources}'
# deployment.yaml — container resources block
spec:
containers:
- name: my-app
image: gcr.io/my-project/my-app:latest
resources:
- requests:
- cpu: "0"
- memory: "64Mi"
- limits:
- cpu: "0"
- memory: "64Mi"
+ requests:
+ cpu: "250m"
+ memory: "512Mi"
+ limits:
+ cpu: "500m"
+ memory: "512Mi"
Autopilot CPU/Memory ratio rule: For the
general-purposecompute class, memory must be between 1 GiB and 6.5 GiB per vCPU. A request of250mCPU with512Mimemory is valid. A request of250mCPU with64Mimemory may be rejected or mutated by the Autopilot admission webhook.
Enterprise Best Practice — Compute Classes, PodDisruptionBudgets, and Topology Spread
For production workloads, relying on default Autopilot scheduling is insufficient. Implement explicit compute class targeting and spread constraints.
# deployment.yaml — full production-grade spec
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
spec:
+ replicas: 2
selector:
matchLabels:
app: my-app
template:
metadata:
labels:
app: my-app
+ cloud.google.com/compute-class: "general-purpose"
spec:
+ topologySpreadConstraints:
+ - maxSkew: 1
+ topologyKey: topology.kubernetes.io/zone
+ whenUnsatisfiable: DoNotSchedule
+ labelSelector:
+ matchLabels:
+ app: my-app
containers:
- name: my-app
image: gcr.io/my-project/my-app:latest
resources:
- requests:
- cpu: "100m"
- memory: "128Mi"
+ requests:
+ cpu: "500m"
+ memory: "1Gi"
+ limits:
+ cpu: "1000m"
+ memory: "1Gi"
# poddisruptionbudget.yaml
+apiVersion: policy/v1
+kind: PodDisruptionBudget
+metadata:
+ name: my-app-pdb
+spec:
+ minAvailable: 1
+ selector:
+ matchLabels:
+ app: my-app
Why replicas: 2 matters here: With a single replica, any node provisioning event = downtime. Two replicas across two zones means Autopilot's not-ready node taint only affects one pod while the other stays live.
💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.
Prevention in CI/CD
Don't let malformed resource requests reach your cluster. Enforce this at the pipeline level.
1. Conftest / OPA Policy (blocks zero-CPU requests at kubectl apply time)
# policy/autopilot_resources.rego
package autopilot.resources
deny[msg] {
container := input.spec.template.spec.containers[_]
cpu_request := container.resources.requests.cpu
units.parse_quantity(cpu_request) < units.parse_quantity("250m")
msg := sprintf("Container '%v' CPU request %v is below Autopilot minimum of 250m", [container.name, cpu_request])
}
deny[msg] {
container := input.spec.template.spec.containers[_]
not container.resources.requests
msg := sprintf("Container '%v' has no resource requests defined — required for GKE Autopilot", [container.name])
}
2. Checkov Scan in GitHub Actions
# .github/workflows/checkov.yaml
- name: Checkov Kubernetes Scan
uses: bridgecrewio/checkov-action@master
with:
directory: k8s/
framework: kubernetes
check: CKV_K8S_10,CKV_K8S_11,CKV_K8S_12,CKV_K8S_13
# CKV_K8S_10: CPU requests set
# CKV_K8S_11: CPU limits set
# CKV_K8S_12: Memory requests set
# CKV_K8S_13: Memory limits set
3. Kustomize Overlay for Environment-Specific Resource Floors
# overlays/production/resources-patch.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
spec:
template:
spec:
containers:
- name: my-app
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "1000m"
memory: "1Gi"
4. Monitor Pending Pods with Cloud Monitoring Alert
# MQL alert policy — fires if any pod is Pending > 5 minutes
fetch k8s_pod
| metric 'kubernetes.io/pod/volume/total_bytes'
| filter (resource.labels.namespace_name != 'kube-system')
# Use GKE Workload metrics: kubernetes.io/pod/status/phase with phase=Pending
# Set threshold: > 0 for 5 consecutive minutes → PagerDuty/Slack
Combine this with a GKE Autopilot quota alert on compute.googleapis.com/cpus in your GCP project to catch regional capacity exhaustion before it cascades.