Initializing Enclave...

Fixing GKE Autopilot Pods Stuck in Pending: Insufficient CPU & Node Not-Ready Taint

Threat/Impact Level: HIGH | Downtime Risk: HIGH | Time to Fix: 5–15 mins


TL;DR

  • What broke: Your pod's CPU requests exceed available allocatable CPU on the sole ready node, and the only other node is tainted node.kubernetes.io/not-ready, meaning the scheduler has zero valid targets.
  • How to fix it: Right-size your CPU requests to match Autopilot's predefined resource tiers, or add a tolerations block if you intentionally need to schedule onto not-ready nodes during warm-up (rare/advanced).
  • Fast path: Use our Client-Side Sandbox above to auto-refactor your pod spec — paste your YAML, get corrected resources blocks instantly without leaking your config to a third party.

The Incident (What Does the Error Mean?)

Raw scheduler event:

0/1 nodes are available:
  1 Insufficient cpu,
  1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.

The Kubernetes scheduler evaluated every node in the cluster and found no viable placement target:

  1. Insufficient cpu — The sum of CPU requests across all pods already scheduled on the ready node, plus your new pod's request, exceeds the node's allocatable CPU. In Autopilot, you don't manage node pools directly, but Autopilot still enforces minimum and maximum resource request tiers. If your request is malformed (e.g., requests.cpu: 0 or an absurdly high value like 32), Autopilot either rejects it or cannot find a matching machine shape fast enough.
  2. node.kubernetes.io/not-ready — A second node exists but is mid-provisioning or unhealthy. Autopilot is likely spinning up a new node in response to the pending pod, but it hasn't passed readiness checks yet. Your pod spec has no toleration for this taint, so the scheduler won't place it there even temporarily.

Immediate consequence: The pod stays in Pending indefinitely if Autopilot provisioning stalls (e.g., quota exhaustion, region capacity, or a misconfigured request that no Autopilot tier can satisfy).


The Attack Vector / Blast Radius

This is a cascading availability failure, not a security exploit, but the blast radius is severe in production:

  • Single-replica deployments go fully dark. If this is your only pod for a service, it's a complete outage the moment the previous pod terminates.
  • HPA death spiral. If a Horizontal Pod Autoscaler is active, it may attempt to scale out, creating more Pending pods, which triggers more Autopilot node provisioning requests, which exhausts your GCP project's CPUS quota faster — turning a scheduling hiccup into a quota incident.
  • Autopilot provisioning stall. Autopilot will not provision a node for a CPU request it cannot map to a valid Compute Class. A request of cpu: 0 or a non-standard value like 1500m with a memory ratio that violates Autopilot's constraints will leave the pod Pending forever — no new node will ever appear.
  • Readiness probe failures upstream. Any service depending on this pod (via ClusterIP or Ingress) starts returning 502/503 as soon as the endpoint is deregistered.

How to Fix It (The Solution)

Basic Fix — Right-Size CPU Requests for Autopilot

Autopilot enforces a minimum of 250m CPU per container and requires CPU/memory ratios within specific bounds. The most common cause of Insufficient cpu on Autopilot is either an omitted requests block (defaults to 0, which Autopilot mutates but may conflict) or a value that doesn't align with available node shapes.

Check your current pod spec:

kubectl describe pod <pod-name> -n <namespace>
kubectl get events -n <namespace> --sort-by='.lastTimestamp'

Check Autopilot's mutation of your requests:

kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].resources}'
# deployment.yaml — container resources block
 spec:
   containers:
   - name: my-app
     image: gcr.io/my-project/my-app:latest
     resources:
-      requests:
-        cpu: "0"
-        memory: "64Mi"
-      limits:
-        cpu: "0"
-        memory: "64Mi"
+      requests:
+        cpu: "250m"
+        memory: "512Mi"
+      limits:
+        cpu: "500m"
+        memory: "512Mi"

Autopilot CPU/Memory ratio rule: For the general-purpose compute class, memory must be between 1 GiB and 6.5 GiB per vCPU. A request of 250m CPU with 512Mi memory is valid. A request of 250m CPU with 64Mi memory may be rejected or mutated by the Autopilot admission webhook.


Enterprise Best Practice — Compute Classes, PodDisruptionBudgets, and Topology Spread

For production workloads, relying on default Autopilot scheduling is insufficient. Implement explicit compute class targeting and spread constraints.

# deployment.yaml — full production-grade spec
 apiVersion: apps/v1
 kind: Deployment
 metadata:
   name: my-app
 spec:
+  replicas: 2
   selector:
     matchLabels:
       app: my-app
   template:
     metadata:
       labels:
         app: my-app
+        cloud.google.com/compute-class: "general-purpose"
     spec:
+      topologySpreadConstraints:
+      - maxSkew: 1
+        topologyKey: topology.kubernetes.io/zone
+        whenUnsatisfiable: DoNotSchedule
+        labelSelector:
+          matchLabels:
+            app: my-app
       containers:
       - name: my-app
         image: gcr.io/my-project/my-app:latest
         resources:
-          requests:
-            cpu: "100m"
-            memory: "128Mi"
+          requests:
+            cpu: "500m"
+            memory: "1Gi"
+          limits:
+            cpu: "1000m"
+            memory: "1Gi"
# poddisruptionbudget.yaml
+apiVersion: policy/v1
+kind: PodDisruptionBudget
+metadata:
+  name: my-app-pdb
+spec:
+  minAvailable: 1
+  selector:
+    matchLabels:
+      app: my-app

Why replicas: 2 matters here: With a single replica, any node provisioning event = downtime. Two replicas across two zones means Autopilot's not-ready node taint only affects one pod while the other stays live.


💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.


Prevention in CI/CD

Don't let malformed resource requests reach your cluster. Enforce this at the pipeline level.

1. Conftest / OPA Policy (blocks zero-CPU requests at kubectl apply time)

# policy/autopilot_resources.rego
package autopilot.resources

deny[msg] {
  container := input.spec.template.spec.containers[_]
  cpu_request := container.resources.requests.cpu
  units.parse_quantity(cpu_request) < units.parse_quantity("250m")
  msg := sprintf("Container '%v' CPU request %v is below Autopilot minimum of 250m", [container.name, cpu_request])
}

deny[msg] {
  container := input.spec.template.spec.containers[_]
  not container.resources.requests
  msg := sprintf("Container '%v' has no resource requests defined — required for GKE Autopilot", [container.name])
}

2. Checkov Scan in GitHub Actions

# .github/workflows/checkov.yaml
- name: Checkov Kubernetes Scan
  uses: bridgecrewio/checkov-action@master
  with:
    directory: k8s/
    framework: kubernetes
    check: CKV_K8S_10,CKV_K8S_11,CKV_K8S_12,CKV_K8S_13
    # CKV_K8S_10: CPU requests set
    # CKV_K8S_11: CPU limits set
    # CKV_K8S_12: Memory requests set
    # CKV_K8S_13: Memory limits set

3. Kustomize Overlay for Environment-Specific Resource Floors

# overlays/production/resources-patch.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  template:
    spec:
      containers:
      - name: my-app
        resources:
          requests:
            cpu: "500m"
            memory: "1Gi"
          limits:
            cpu: "1000m"
            memory: "1Gi"

4. Monitor Pending Pods with Cloud Monitoring Alert

# MQL alert policy — fires if any pod is Pending > 5 minutes
fetch k8s_pod
| metric 'kubernetes.io/pod/volume/total_bytes'
| filter (resource.labels.namespace_name != 'kube-system')
# Use GKE Workload metrics: kubernetes.io/pod/status/phase with phase=Pending
# Set threshold: > 0 for 5 consecutive minutes → PagerDuty/Slack

Combine this with a GKE Autopilot quota alert on compute.googleapis.com/cpus in your GCP project to catch regional capacity exhaustion before it cascades.

Related Diagnostics

"Part of the Performance Utility Matrix."

View all 219 Performance Tools →